Hi,
Is someone working on a project on integrating Oryx model serving layer
with Spark ? Models will be built using either Streaming data / Batch data
in HDFS and cross validated with mllib APIs but the model serving layer
will give API endpoints like Oryx
and read the models may be from hdfs/impa
Hi,
Is someone working on a project on integrating Oryx model serving layer
with Spark ? Models will be built using either Streaming data / Batch data
in HDFS and cross validated with mllib APIs but the model serving layer
will give API endpoints like Oryx
and read the models may be from hdfs/impa
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14175167#comment-14175167
]
Debasish Das commented on SPARK-2426:
-
1. [~mengxr] Our legal was clear that Stan
Hi,
I am validating the proximal algorithm for positive and bound constrained
ALS and I came across the bug detailed in the JIRA while running ALS with
NNLS:
https://issues.apache.org/jira/browse/SPARK-3987
ADMM based proximal algorithm came up with correct result...
Thanks.
Deb
Debasish Das created SPARK-3987:
---
Summary: NNLS generates incorrect result
Key: SPARK-3987
URL: https://issues.apache.org/jira/browse/SPARK-3987
Project: Spark
Issue Type: Bug
It will really help if the spark users can point to github examples that
integrated spark and playspecifically SparkSQL and Play...
On Thu, Oct 16, 2014 at 9:23 AM, Mohammed Guller
wrote:
> Daniel,
>
> Thanks for sharing this. It is very helpful.
>
>
>
> The reason I want to use Spark subm
Just checked, QR is exposed by netlib: import org.netlib.lapack.Dgeqrf
For the equality and bound version, I will use QR...it will be faster than
the LU that I am using through jblas.solveSymmetric...
On Thu, Oct 16, 2014 at 8:34 AM, Debasish Das
wrote:
> @xiangrui should we add this epsi
benchmarked that but I opted for QR in a different implementation and it
> has worked fine.
>
> Now I have to go hunt for how the QR decomposition is exposed in BLAS...
> Looks like its GEQRF which JBLAS helpfully exposes. Debasish you could try
> it for fun at least.
> On Oct 15,
ct 15, 2014 at 5:01 PM, Liquan Pei wrote:
> Hi Debaish,
>
> I think ||r - wi'hj||^{2} is semi-positive definite.
>
> Thanks,
> Liquan
>
> On Wed, Oct 15, 2014 at 4:57 PM, Debasish Das
> wrote:
>
>> Hi,
>>
>> If I take the Movielens data and run the
Hi,
If I take the Movielens data and run the default ALS with regularization as
0.0, I am hitting exception from LAPACK that the gram matrix is not
positive definite. This is on the master branch.
This is how I run it :
./bin/spark-submit --total-executor-cores 1 --master spark://
tusca09lmlvt00
Awesome news Matei !
Congratulations to the databricks team and all the community members...
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia
wrote:
> Hi folks,
>
> I interrupt your regularly scheduled user / dev list to bring you some
> pretty cool news for the project, which is that we've been
Awesome news Matei !
Congratulations to the databricks team and all the community members...
On Fri, Oct 10, 2014 at 7:54 AM, Matei Zaharia
wrote:
> Hi folks,
>
> I interrupt your regularly scheduled user / dev list to bring you some
> pretty cool news for the project, which is that we've been
I have faced this in the past and I have to put a profile -Phadoop2.3...
mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn -DskipTests install
On Wed, Oct 8, 2014 at 1:40 PM, Chuang Liu wrote:
> Hi:
>
> I tried to build Spark (1.1.0) with hadoop 2.4.0, and ran a simple
> wordcount example
N
> log4j.logger.kafka=WARN
> log4j.logger.akka=WARN
> log4j.logger.org.apache.spark=WARN
> log4j.logger.org.apache.spark.storage.BlockManager=ERROR
> log4j.logger.org.apache.zookeeper=WARN
> log4j.logger.org.eclipse.jetty=WARN
> log4j.logger.org.I0Itec.zkclient=WARN
>
> On Tue, Oct 7, 2014 at 7:42 PM, Deb
Hi,
I have added some changes to ALS tests and I am re-running tests as:
mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn
-DwildcardSuites=org.apache.spark.mllib.recommendation.ALSSuite test
I have some INFO logs in the code which I want to see on my console. They
work fine if I add print
Another rule of thumb is that definitely cache the RDD over which you need
to do iterative analysis...
For rest of them only cache if you have lot of free memory !
On Mon, Oct 6, 2014 at 2:39 PM, Sean Owen wrote:
> I think you mean that data2 is a function of data1 in the first
> example. I ima
Hi,
We write the output of models and other information as parquet files and
later we let data APIs run SQL queries on the columnar data...
SparkSQL is used to dump the data in parquet format and now we are
considering whether using SparkSQL or Impala to read it back...
I came across this benchm
If the missing values are 0, then you can also look into implicit
formulation...
On Tue, Sep 30, 2014 at 12:05 PM, Xiangrui Meng wrote:
> We don't handle missing value imputation in the current version of
> MLlib. In future releases, we can store feature information in the
> dataset metadata, wh
Can't you extend a class in place of object which can be generic ?
class GenericAccumulator[B] extends AccumulatorParam[Seq[B]] {
}
On Wed, Oct 1, 2014 at 3:38 AM, Johan Stenberg
wrote:
> Just realized that, of course, objects can't be generic, but how do I
> create a generic AccumulatorParam?
Only fit the data in memory where you want to run the iterative
algorithm
For map-reduce operations, it's better not to cache if you have a memory
crunch...
Also schedule the persist and unpersist such that you utilize the RAM
well...
On Tue, Sep 30, 2014 at 4:34 PM, Liquan Pei wrote:
> Hi
If the tree is too big build it on graphxbut it will need thorough
analysis so that the partitions are well balanced...
On Tue, Sep 30, 2014 at 2:45 PM, Andy Twigg wrote:
> Hi Boromir,
>
> Assuming the tree fits in memory, and what you want to do is parallelize
> the computation, the 'obviou
I have done mvn clean several times...
Consistently all the mllib tests that are using
LocalClusterSparkContext.scala, they fail !
Hi,
Inside mllib I am running tests using:
mvn -Dhadoop.version=2.3.0-cdh5.1.0 -Phadoop-2.3 -Pyarn install
The locat tests run fine but cluster tests are failing..
LBFGSClusterSuite:
- task size should be small *** FAILED ***
org.apache.spark.SparkException: Job aborted due to stage failure
You should look into Evan Spark's talk from Spark Summit 2014
http://spark-summit.org/2014/talk/model-search-at-scale
I am not sure if some of it is already open sourced through MLBase...
On Mon, Sep 29, 2014 at 7:45 PM, Lochana Menikarachchi
wrote:
> Hi,
>
> Is there anyone who works on hyper
ep 24, 2014 at 9:41 AM, Debasish Das
wrote:
> spark SQL reads parquet file fine...did you follow one of these to
> read/write parquet from spark ?
>
> http://zenfractal.com/2013/08/21/a-powerful-big-data-trio/
>
> On Wed, Sep 24, 2014 at 9:29 AM, Ted Yu wrote:
>
>> Addi
;>> I was thinking along the same line.
>>>
>>> Jianshi:
>>> See
>>> http://hbase.apache.org/book.html#d0e6369
>>>
>>> On Wed, Sep 24, 2014 at 8:56 AM, Debasish Das
>>> wrote:
>>>
>>>> HBase regionserver nee
HBase regionserver needs to be balancedyou might have some skewness in
row keys and one regionserver is under pressuretry finding that key and
replicate it using random salt
On Wed, Sep 24, 2014 at 8:51 AM, Jianshi Huang
wrote:
> Hi Ted,
>
> It converts RDD[Edge] to HBase rowkey and colu
ay not happen to have
> exhibited in your test.
>
> On Sun, Sep 21, 2014 at 2:13 AM, Debasish Das
> wrote:
> > Some more debug revealed that as Sean said I have to keep the
> dictionaries
> > persisted till I am done with the RDD manipulation.
> >
> > Thank
the DAG to speculate such things...similar to branch
prediction ideas from comp arch...
On Sat, Sep 20, 2014 at 1:56 PM, Debasish Das
wrote:
> I changed zipWithIndex to zipWithUniqueId and that seems to be working...
>
> What's the difference between zipWithIndex vs zipWithUniq
#x27;s not
very clear from docs...
On Sat, Sep 20, 2014 at 1:48 PM, Debasish Das
wrote:
> I did not persist / cache it as I assumed zipWithIndex will preserve
> order...
>
> There is also zipWithUniqueId...I am trying that...If that also shows the
> same issue, we should make it c
is being used to assign IDs. From a
> recent JIRA discussion I understand this is not deterministic within a
> partition so the index can be different when the RDD is reevaluated. If you
> need it fixed, persist the zipped RDD on disk or in memory.
> On Sep 20, 2014 8:10 PM, "
Hi,
I am building a dictionary of RDD[(String, Long)] and after the dictionary
is built and cached, I find key "almonds" at value 5187 using:
rdd.filter{case(product, index) => product == "almonds"}.collect
Output:
Debug product almonds index 5187
Now I take the same dictionary and write it out
Thanks Christoph.
Are these numbers for mllib als implicit and explicit feedback on
movielens/netflix datasets documented on JIRA ?
On Sep 19, 2014 1:16 PM, "Christoph Sawade" <
christoph.saw...@googlemail.com> wrote:
> Hey Deb,
>
> NDCG is the "Normalized Discounted Cumulative Gain" [1]. Anothe
Hi Xiangrui,
Could you please point to some reference for calculating prec@k and ndcg@k ?
prec is precision I suppose but ndcg I have no idea about...
Thanks.
Deb
On Mon, Aug 25, 2014 at 12:28 PM, Xiangrui Meng wrote:
> The evaluation metrics are definitely useful. How do they differ from
>
The PR will updated
> today.
> Best,
> Reza
>
> On Thu, Sep 18, 2014 at 2:06 PM, Debasish Das
> wrote:
>
>> Hi Reza,
>>
>> Have you tested if different runs of the algorithm produce different
>> similarities (basically if the algorithm is deterministic) ?
. We can add jaccard and other similarity measures in
> later PRs.
>
> In the meantime, you can un-normalize the cosine similarities to get the
> dot product, and then compute the other similarity measures from the dot
> product.
>
> Best,
> Reza
>
>
> On Wed, S
sc.parallelize(model.weights.toArray, blocks).top(k) will get that right ?
For logistic you might want both positive and negative feature...so just
pass it through a filter on abs and then pick top(k)
On Thu, Sep 18, 2014 at 10:30 AM, Sameer Tilak wrote:
> Hi All,
>
> I am able to run LinearReg
n the meantime, you can un-normalize the cosine similarities to get the
> dot product, and then compute the other similarity measures from the dot
> product.
>
> Best,
> Reza
>
>
> On Wed, Sep 17, 2014 at 6:52 PM, Debasish Das
> wrote:
>
>> Hi Reza,
>>
>
Hi,
I have some RowMatrices all with the same key (MatrixEntry.i,
MatrixEntry.j) and I would like to join multiple matrices to come up with a
sqlTable for each key...
What's the best way to do it ?
Right now I am doing N joins if I want to combine data from N matrices
which does not look quite r
We dump fairly big libsvm to compare against liblinear/libsvm...the
following code dumps out libsvm format from SparseVector...
def toLibSvm(features: SparseVector): String = {
val indices = features.indices.map(_ + 1)
val values = features.values
indices.zip(values).mkString(" ").r
RowMatrix and CoordinateMatrix to be templated on the value...
Are you considering this in your design ?
Thanks.
Deb
On Tue, Sep 9, 2014 at 9:45 AM, Reza Zadeh wrote:
> Better to do it in a PR of your own, it's not sufficiently related to
> dimsum
>
> On Tue, Sep 9, 2014 at 7:03
Congratulations on the 1.1 release !
On Thu, Sep 11, 2014 at 9:08 PM, Matei Zaharia
wrote:
> Thanks to everyone who contributed to implementing and testing this
> release!
>
> Matei
>
> On September 11, 2014 at 11:52:43 PM, Tim Smith (secs...@gmail.com) wrote:
>
> Thanks for all the good work. V
ALS is using a bunch of off-heap memory?). You mentioned
> earlier in this thread that the property wasn't showing up in the
> Environment tab. Are you sure it's making it in?
>
> -Sandy
>
> On Tue, Sep 9, 2014 at 11:58 AM, Debasish Das
> wrote:
>
>> Hmm...I d
.
>
> -Sandy
>
> On Tue, Sep 9, 2014 at 7:32 AM, Debasish Das
> wrote:
>
>> Hi Sandy,
>>
>> Any resolution for YARN failures ? It's a blocker for running spark on
>> top of YARN.
>>
>> Thanks.
>> Deb
>>
>> On Tue, Aug 19,
21 . We know that the
> container got killed by YARN because it used much more memory that it
> requested. But we haven't figured out the root cause yet.
>
> +Sandy
>
> Best,
> Xiangrui
>
> On Tue, Aug 19, 2014 at 8:51 PM, Debasish Das
> wrote:
> > Hi,
>
her one. For dense matrices with say, 1m
> columns this won't be computationally feasible and you'll want to start
> sampling with dimsum.
>
> It would be helpful to have a loadRowMatrix function, I would use it.
>
> Best,
> Reza
>
> On Tue, Sep 9, 2014 at 12:05
y in a future PR, probably
> still for 1.2
>
>
> On Fri, Sep 5, 2014 at 9:15 PM, Debasish Das
> wrote:
>
>> Awesome...Let me try it out...
>>
>> Any plans of putting other similarity measures in future (jaccard is
>> something that will be useful) ? I gue
how to do linear programming in a distributed way.
> -Xiangrui
>
> On Mon, Sep 8, 2014 at 7:12 AM, Debasish Das
> wrote:
> > Xiangrui,
> >
> > Should I open up a JIRA for this ?
> >
> > Distributed lp/socp solver through ecos/ldl/amd ?
> >
> > I c
e jni version of ldl and
amd which are lgpl...
Let me know.
Thanks.
Deb
On Sep 8, 2014 7:04 AM, "Debasish Das" wrote:
> Durin,
>
> I have integrated ecos with spark which uses suitesparse under the hood
> for linear equation solvesI have exposed only the qp solver
Durin,
I have integrated ecos with spark which uses suitesparse under the hood for
linear equation solvesI have exposed only the qp solver api in spark
since I was comparing ip with proximal algorithms but we can expose
suitesparse api as well...jni is used to load up ldl amd and ecos librarie
sum with gamma as PositiveInfinity turns it
> into the usual brute force algorithm for cosine similarity, there is no
> sampling. This is by design.
>
>
> On Fri, Sep 5, 2014 at 8:20 PM, Debasish Das
> wrote:
>
>> I looked at the code: similarColumns(Double.posIn
ring (perhaps after dimensionality
> reduction) if your goal is to find batches of similar points instead of all
> pairs above a threshold.
>
>
>
>
> On Fri, Sep 5, 2014 at 8:02 PM, Debasish Das
> wrote:
>
>> Also for tall and wide (rows ~60M, columns 10M), I am conside
Also for tall and wide (rows ~60M, columns 10M), I am considering running a
matrix factorization to reduce the dimension to say ~60M x 50 and then run
all pair similarity...
Did you also try similar ideas and saw positive results ?
On Fri, Sep 5, 2014 at 7:54 PM, Debasish Das
wrote:
>
you don't have to redo your code. Your call if you need it before a week.
> Reza
>
>
> On Fri, Sep 5, 2014 at 7:43 PM, Debasish Das
> wrote:
>
>> Ohh coolall-pairs brute force is also part of this PR ? Let me pull
>> it in and test on our dataset...
>>
&g
e/spark/pull/1778
>
> Your question wasn't entirely clear - does this answer it?
>
> Best,
> Reza
>
>
> On Fri, Sep 5, 2014 at 6:14 PM, Debasish Das
> wrote:
>
>> Hi Reza,
>>
>> Have you compared with the brute force algorithm for sim
Hi Reza,
Have you compared with the brute force algorithm for similarity computation
with something like the following in Spark ?
https://github.com/echen/scaldingale
I am adding cosine similarity computation but I do want to compute an all
pair similarities...
Note that the data is sparse for
Breeze author David also has a github project on cuda binding in
scalado you prefer using java or scala ?
On Aug 27, 2014 2:05 PM, "Frank van Lankvelt"
wrote:
> you could try looking at ScalaCL[1], it's targeting OpenCL rather than
> CUDA, but that might be close enough?
>
> cheers, Frank
>
Hi Burak,
This LDA implementation is friendly to the equality and positivity als code
that I added in the following JIRA to formulate robust plsa
https://issues.apache.org/jira/plugins/servlet/mobile#issue/SPARK-2426
Should I build upon the PR that you pointed ? I want to run some
experiment
odeManager
> configuration, yarn.nodemanager.vmem-check-enabled is set to false.
>
> -Sandy
>
>
> On Wed, Aug 20, 2014 at 12:27 AM, Debasish Das
> wrote:
>
>> I could reproduce the issue in both 1.0 and 1.1 using YARN...so this is
>> definitely a YARN related problem...
&
sai.com
> LinkedIn: https://www.linkedin.com/in/dbtsai
>
>
> On Wed, Aug 20, 2014 at 3:19 PM, Debasish Das
> wrote:
> > Hi Patrick,
> >
> > Last few days I came across some bugs which got exposed due to ALS runs
> on
> > large scale data...although it
rk's actor system
> directly - it is an internal communication component in Spark and could
> e.g. be re-factored later to not use akka at all. Could you elaborate a bit
> more on your use case?
>
> - Patrick
>
>
> On Wed, Aug 20, 2014 at 9:02 AM, Debasish Das
> wrot
Hi,
There have been some recent changes in the way akka is used in spark and I
feel they are major changes...
Is there a design document / JIRA / experiment on large datasets that
highlight the impact of changes (1.0 vs 1.1) ? Basically it will be great
to understand where akka is used in the cod
issue as described in
> https://issues.apache.org/jira/browse/SPARK-2121 . We know that the
> container got killed by YARN because it used much more memory that it
> requested. But we haven't figured out the root cause yet.
>
> +Sandy
>
> Best,
> Xiangrui
>
> O
Hi,
During the 4th ALS iteration, I am noticing that one of the executor gets
disconnected:
14/08/19 23:40:00 ERROR network.ConnectionManager: Corresponding
SendingConnectionManagerId not found
14/08/19 23:40:00 INFO cluster.YarnClientSchedulerBackend: Executor 5
disconnected, so removing it
14
ed on YARN ?
@dbtsai did your assembly on YARN ran fine or you are still noticing these
exceptions ?
Thanks.
Deb
On Thu, Aug 14, 2014 at 5:48 PM, Reynold Xin wrote:
> Here: https://github.com/apache/spark/pull/1948
>
>
>
> On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das
> wro
With the fixes, I could run it fine on top of branch-1.0
On master when running on YARN I am getting another KryoException:
Exception in thread "main" org.apache.spark.SparkException: Job aborted due
to stage failure: Task 247 in stage 52.0 failed 4 times, most recent
failure: Lost task 247.3 in
Hi,
We are running the snapshots (new spark features) on YARN and I was
wondering if the webui is available on YARN mode...
The deployment document does not mention webui on YARN mode...
Is it available ?
Thanks.
Deb
Hi Wei,
Sparkler code was not available for benchmarking and so I picked up
Jellyfish which uses SGD and if you look at the paper, the ideas are very
similar to sparkler paper but Jellyfish is on shared memory and uses C code
while sparkler was built on top of spark...Jellyfish used some interesti
Hi Brandon,
Looks very cool...will try it out for ad-hoc analysis of our datasets and
provide more feedback...
Could you please give bit more details about the differences of Spindle
architecture compared to Hue + Spark integration (python stack) and Ooyala
Jobserver ?
Does Spindle allow sharing
Hi,
Are there any experiments detailing the performance hit due to HDFS
checkpoint in ALS ?
As we scale to large ranks with more ratings, I believe we have to cut the
RDD lineage to safe guard against the lineage issue...
Thanks.
Deb
DB,
Did you compare softmax regression with one-vs-all and found that softmax
is better ?
one-vs-all can be implemented as a wrapper over binary classifier that we
have in mllib...I am curious if softmax multinomial is better on most cases
or is it worthwhile to add a one vs all version of mlor a
5:48 PM, "Reynold Xin" wrote:
> Here: https://github.com/apache/spark/pull/1948
>
>
>
> On Thu, Aug 14, 2014 at 5:45 PM, Debasish Das
> wrote:
>
>> Is there a fix that I can test ? I have the flows setup for both
>> standalone and YARN runs...
>
't have the whole context and obviously I haven't spent nearly
>>>>> as much time on this as you have, but I'm wondering what if we always pass
>>>>> the executor's ClassLoader to the Kryo serializer? Will that solve this
>>>>> proble
Hi,
For our large ALS runs, we are considering using sc.setCheckPointDir so
that the intermediate factors are written to HDFS and the lineage is
broken...
Is there a comparison which shows the performance degradation due to these
options ? If not I will be happy to add experiments with it...
Tha
Actually I faced it yesterday...
I had to put it in spark-env.sh and take it out from spark-defaults.conf on
1.0.1...Note that this settings should be visible on all workers..
After that I validated that SPARK_LOCAL_DIRS was indeed getting used for
shuffling...
On Thu, Aug 14, 2014 at 10:27 AM,
the default).
>>> Theoretically Spark supports custom serialisers, but due to a related
>>> issue, custom serialisers currently can't live in application jars and must
>>> be available to all executors at launch. My PR fixes this issue as well,
>>> allowin
Sorry I just saw Graham's email after sending my previous email about this
bug...
I have been seeing this same issue on our ALS runs last week but I thought
it was due my hacky way to run mllib 1.1 snapshot on core 1.0...
What's the status of this PR ? Will this fix be back-ported to 1.0.1 as we
Hi,
Is there a JIRA for this bug ?
I have seen it multiple times during our ALS runs now...some runs don't
show while some runs fail due to the error msg
https://github.com/GrahamDennis/spark-kryo-serialisation/blob/master/README.md
One way to circumvent this is to not use kryo but then I am no
Hi,
I have set up the SPARK_LOCAL_DIRS option in spark-env.sh so that Spark can
use more shuffle space...
Does Spark cleans all the shuffle files once the runs are done ? Seems to
me that the shuffle files are not cleaned...
Do I need to set this variable ? spark.cleaner.ttl
Right now we are pl
Dennis,
If it is PLSA with least square loss then the QuadraticMinimizer that we
open sourced should be able to solve it for modest topics (till 1000 I
believe)...if we integrate a cg solver for equality (Nocedal's KNITRO paper
is the reference) the topic size can be increased much larger than ALS
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095232#comment-14095232
]
Debasish Das edited comment on SPARK-2426 at 8/13/14 3:3
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
]
Debasish Das updated SPARK-2426:
Description:
Current ALS supports least squares and nonnegative least squares.
I presented ADMM
[
https://issues.apache.org/jira/browse/SPARK-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14095232#comment-14095232
]
Debasish Das commented on SPARK-2426:
-
Hi Xiangrui,
The branch is ready fo
I figured out the issuethe driver memory was at 512 MB and for our
datasets, the following code needed more memory...
// Materialize usersOut and productsOut.
usersOut.count()
productsOut.count()
Thanks.
Deb
On Sat, Aug 9, 2014 at 6:12 PM, Debasish Das
wrote:
> Actually nope it
LS
locallyMost likely it is a bug
Thanks.
Deb
On Sat, Aug 9, 2014 at 11:12 AM, Debasish Das
wrote:
> Including mllib inside assembly worked fine...If I deploy only the core
> and send mllib as --jars then this problem shows up...
>
> Xiangrui could you please comment if it is a bu
wrote:
> I was having this same problem early this week and had to include my
> changes in the assembly.
>
>
> On Sat, Aug 9, 2014 at 9:59 AM, Debasish Das
> wrote:
>
>> I validated that I can reproduce this problem with master as well (without
>> adding an
:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
I will try now with mllib inside the assemblyIf that works then
something is weird here !
On Sat, Aug 9, 2014 at 12:46 AM, Debasish Das
wrote:
> Hi Xiangrui,
>
> Based on your suggestion I moved core and
piled with Java 1.7_55 but
the cluster JRE is at 1.7_45.
Thanks.
Deb
On Wed, Aug 6, 2014 at 12:01 PM, Debasish Das
wrote:
> I did not play with Hadoop settings...everything is compiled with
> 2.3.0CDH5.0.2 for me...
>
> I did try to bump the version number of HBase from 0.94 to 0.
Hi Patrick,
I am testing the 1.1 branch but I see lot of protobuf warnings while
building the jars:
[warn] Class com.google.protobuf.Parser not found - continuing with a stub.
[warn] Class com.google.protobuf.Parser not found - continuing with a stub.
[warn] Class com.google.protobuf.Parser not
> One related question, is mllib jar independent from hadoop version (doesnt
> use hadoop api directly)? Can I use mllib jar compile for one version of
> hadoop and use it in another version of hadoop?
>
> Sent from my Google Nexus 5
> On Aug 6, 2014 8:29 AM, "Debasish D
I'm really interested in how
> they differ in the final recommendation? It would be great if you can
> test prec@k or ndcg@k metrics.
>
> Best,
> Xiangrui
>
> On Wed, Aug 6, 2014 at 8:28 AM, Debasish Das
> wrote:
> > Hi Xiangrui,
> >
> > Maintaining another
scala.concurrent.forkjoin.ForkJoinPool.runWorker(ForkJoinPool.java:1979)
at
scala.concurrent.forkjoin.ForkJoinWorkerThread.run(ForkJoinWorkerThread.java:107)
On Tue, Aug 5, 2014 at 5:59 PM, Debasish Das
wrote:
> Hi Xiangrui,
>
> I used your idea and kept a cherry picked version of ALS.sc
:
> If you cannot change the Spark jar deployed on the cluster, an easy
> solution would be renaming ALS in your jar. If userClassPathFirst
> doesn't work, could you create a JIRA and attach the log? Thanks!
> -Xiangrui
>
> On Tue, Aug 5, 2014 at 9:10 AM, Debasish Das
>
rst is behaving, there might be bugs in it...
Any suggestions will be appreciated
Thanks.
Deb
On Sat, Aug 2, 2014 at 11:12 AM, Xiangrui Meng wrote:
> Yes, that should work. spark-mllib-1.1.0 should be compatible with
> spark-core-1.0.1.
>
> On Sat, Aug 2, 2014 at 10:54 AM, Debasi
'm not
> sure whether it could solve your problem. -Xiangrui
>
> On Sat, Aug 2, 2014 at 10:13 AM, Debasish Das
> wrote:
> > Hi,
> >
> > I have deployed spark stable 1.0.1 on the cluster but I have new code
> that
> > I added in mllib-1.1.0-SNAPSHOT.
> &g
Hi,
I have deployed spark stable 1.0.1 on the cluster but I have new code that
I added in mllib-1.1.0-SNAPSHOT.
I am trying to access the new code using spark-submit as follows:
spark-job --class com.verizon.bda.mllib.recommendation.ALSDriver
--executor-memory 16g --total-executor-cores 16 --jar
Hi Aureliano,
Will it be possible for you to give the test-case ? You can add it to JIRA
as well as an attachment I guess...
I am preparing the PR for ADMM based QuadraticMinimizer...In my matlab
experiments with scaling the rank to 1000 and beyond (which is too high for
ALS but gives a good idea
I found the issue...
If you use spark git and generate the assembly jar then
org.apache.hadoop.io.Writable.class is packaged with it
If you use the assembly jar that ships with CDH in
/opt/cloudera/parcels/CDH/lib/spark/assembly/lib/spark-assembly_2.10-0.9.0-cdh5.0.2-hadoop2.3.0-cdh5.0.2.jar,
Hi,
We have been using standalone spark for last 6 months and I used to run
application jars fine on spark cluster with the following command.
java -cp
":/app/data/spark_deploy/conf:/app/data/spark_deploy/lib/spark-assembly-1.0.0-SNAPSHOT-hadoop2.0.0-mr1-cdh4.5.0.jar:./app.jar"
-Xms2g -Xmx2g -Ds
[
https://issues.apache.org/jira/browse/SPARK-2602?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14068094#comment-14068094
]
Debasish Das commented on SPARK-2602:
-
CDH5 does not even support java6 any
301 - 400 of 571 matches
Mail list logo