Re: Beam Summit Case Study

2024-09-10 Thread Debasish Das via dev
t; On Mon, Sep 9, 2024 at 9:30 PM Debasish Das via dev > wrote: > >> Hi, >> >> This is Debasish and we discussed at the beam summit today about credit >> karma case studies. >> >> We have done some beam summit talks and case studies in 2022 >> >> h

Re: Welcome two new Apache Spark committers

2023-08-06 Thread Debasish Das
Congratulations Peter and Xidou. On Sun, Aug 6, 2023, 7:05 PM Wenchen Fan wrote: > Hi all, > > The Spark PMC recently voted to add two new committers. Please join me in > welcoming them to their new role! > > - Peter Toth (Spark SQL) > - Xiduo You (Spark SQL) > > They consistently make contribut

Re: Offline elastic index creation

2022-11-10 Thread Debasish Das
Hi Vibhor, We worked on a project to create lucene indexes using spark but the project has not been managed for some time now. If there is interest we can resurrect it https://github.com/vsumanth10/trapezium/blob/master/dal/src/test/scala/com/verizon/bda/trapezium/dal/lucene/LuceneIndexerSuite.sc

Re: Welcome Xinrong Meng as a Spark committer

2022-08-10 Thread Debasish Das
Congratulations Xinrong ! On Tue, Aug 9, 2022, 10:00 PM Rui Wang wrote: > Congrats Xinrong! > > > -Rui > > On Tue, Aug 9, 2022 at 8:57 PM Xingbo Jiang wrote: > >> Congratulations! >> >> Yuanjian Li 于2022年8月9日 周二20:31写道: >> >>> Congratulations, Xinrong! >>> >>> XiDuo You 于2022年8月9日 周二19:18写道: >>

Re: SIGMOD System Award for Apache Spark

2022-05-15 Thread Debasish Das
Congratulations to the whole spark community ! It's a great achievement. On Sat, May 14, 2022, 2:49 AM Yikun Jiang wrote: > Awesome! Congrats to the whole community! > > On Fri, May 13, 2022 at 3:44 AM Matei Zaharia > wrote: > >> Hi all, >> >> We recently found out that Apache Spark received >>

[jira] [Commented] (SPARK-24374) SPIP: Support Barrier Execution Mode in Apache Spark

2018-12-23 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-24374?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16728020#comment-16728020 ] Debasish Das commented on SPARK-24374: -- Hi [~mengxr] with barrier mode avail

Re: dremel paper example schema

2018-10-29 Thread Debasish Das
Open source impl of dremel is parquet ! On Mon, Oct 29, 2018, 8:42 AM Gourav Sengupta wrote: > Hi, > > why not just use dremel? > > Regards, > Gourav Sengupta > > On Mon, Oct 29, 2018 at 1:35 PM lchorbadjiev < > lubomir.chorbadj...@gmail.com> wrote: > >> Hi, >> >> I'm trying to reproduce the exa

[jira] [Comment Edited] (BEAM-3737) Key-aware batching function

2018-06-09 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507150#comment-16507150 ] Debasish Das edited comment on BEAM-3737 at 6/9/18 8:21 PM: ---

[jira] [Commented] (BEAM-3737) Key-aware batching function

2018-06-09 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-3737?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16507150#comment-16507150 ] Debasish Das commented on BEAM-3737: I saw this is being mentioned in TFMA...

[jira] [Commented] (BEAM-2810) Consider a faster Avro library in Python

2018-03-21 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407543#comment-16407543 ] Debasish Das commented on BEAM-2810: [~chamikara] did you try fastavro and pyavro

[jira] [Commented] (BEAM-1442) Performance improvement of the Python DirectRunner

2018-03-21 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407540#comment-16407540 ] Debasish Das commented on BEAM-1442: Thanks [~robertwb]...I will look into BEAM-

[jira] [Commented] (BEAM-2810) Consider a faster Avro library in Python

2018-03-21 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407538#comment-16407538 ] Debasish Das commented on BEAM-2810: I will try reading bq from beam directly

[jira] [Commented] (BEAM-2810) Consider a faster Avro library in Python

2018-03-21 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-2810?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407532#comment-16407532 ] Debasish Das commented on BEAM-2810: our flow starts from bq-export/gcs avro files

[jira] [Commented] (BEAM-1442) Performance improvement of the Python DirectRunner

2018-03-20 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/BEAM-1442?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=16407241#comment-16407241 ] Debasish Das commented on BEAM-1442: Hi...I am pushing 10MB avro files on local

Re: Lucene, Spark, HDFS question

2018-03-14 Thread Debasish Das
I have written spark lucene integration as part of Verizon trapezium/dal project...you can extract the data stored in hdfs indices and feed it to spark... https://github.com/Verizon/trapezium/tree/master/dal/src/test/scala/com/verizon/bda/trapezium/dal I intend to publish it as spark package as s

ECOS Spark Integration

2017-12-17 Thread Debasish Das
Hi, ECOS is a solver for second order conic programs and we showed the Spark integration at 2014 Spark Summit https://spark-summit.org/2014/quadratic-programing-solver-for-non-negative-matrix-factorization/. Right now the examples show how to reformulate matrix factorization as a SOCP and solve ea

ECOS Spark Integration

2017-12-17 Thread Debasish Das
Hi, ECOS is a solver for second order conic programs and we showed the Spark integration at 2014 Spark Summit https://spark-summit.org/2014/quadratic-programing-solver-for-non-negative-matrix-factorization/. Right now the examples show how to reformulate matrix factorization as a SOCP and solve ea

Re: Hinge Gradient

2017-12-17 Thread Debasish Das
If you can point me to previous benchmarks that are done, I would like to use smoothing and see if the LBFGS convergence improved while not impacting linear svc loss. Thanks. Deb On Dec 16, 2017 7:48 PM, "Debasish Das" wrote: Hi Weichen, Traditionally svm are solved using quadratic p

Re: Hinge Gradient

2017-12-16 Thread Debasish Das
re that proves changing max to soft-max can behave > well? > I’m more than happy to see some benchmarks if you can have. > > + Yuhao, who did similar effort in this PR: https://github.com/apache/ > spark/pull/17862 > > Regards > Yanbo > > On Dec 13, 2017, at 12:20 A

Hinge Gradient

2017-12-13 Thread Debasish Das
Hi, I looked into the LinearSVC flow and found the gradient for hinge as follows: Our loss function with {0, 1} labels is max(0, 1 - (2y - 1) (f_w(x))) Therefore the gradient is -(2y - 1)*x max is a non-smooth function. Did we try using ReLu/Softmax function and use that to smooth the hinge los

Re: [Vote] SPIP: Continuous Processing Mode for Structured Streaming

2017-11-01 Thread Debasish Das
+1 Is there any design doc related to API/internal changes ? Will CP be the default in structured streaming or it's a mode in conjunction with exisiting behavior. Thanks. Deb On Nov 1, 2017 8:37 AM, "Reynold Xin" wrote: Earlier I sent out a discussion thread for CP in Structured Streaming: ht

Re: Restful API Spark Application

2017-05-16 Thread Debasish Das
You can run l On May 15, 2017 3:29 PM, "Nipun Arora" wrote: > Thanks all for your response. I will have a look at them. > > Nipun > > On Sat, May 13, 2017 at 2:38 AM vincent gromakowski < > vincent.gromakow...@gmail.com> wrote: > >> It's in scala but it should be portable in java >> https://githu

Re: Practical configuration to run LSH in Spark 2.1.0

2017-02-10 Thread Debasish Das
If it is 7m rows and 700k features (or say 1m features) brute force row similarity will run fine as well...check out spark-4823...you can compare quality with approximate variant... On Feb 9, 2017 2:55 AM, "nguyen duc Tuan" wrote: > Hi everyone, > Since spark 2.1.0 introduces LSH (http://spark.ap

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-05 Thread Debasish Das
ector. > There is no API exposed. It is WIP but not yet released. > > On Sat, Feb 4, 2017 at 11:07 PM, Debasish Das > wrote: > >> If we expose an API to access the raw models out of PipelineModel can't >> we call predict directly on it from an API ? Is there a task ope

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Debasish Das
, graph and kernel models we use a lot and for them turned out that mllib style model predict were useful if we change the underlying store... On Feb 4, 2017 9:37 AM, "Debasish Das" wrote: > If we expose an API to access the raw models out of PipelineModel can't we > call p

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Debasish Das
l.Model >predict API". The predict API is in the old mllib package not the new ml >package. >- "why r we using dataframe and not the ML model directly from API" - >Because as of now the new ml package does not have the direct API. > > > On Sat, F

Re: [ML] MLeap: Deploy Spark ML Pipelines w/o SparkContext

2017-02-04 Thread Debasish Das
I am not sure why I will use pipeline to do scoring...idea is to build a model, use model ser/deser feature to put it in the row or column store of choice and provide a api access to the model...we support these primitives in github.com/Verizon/trapezium...the api has access to spark context in loc

Re: Old version of Spark [v1.2.0]

2017-01-16 Thread Debasish Das
You may want to pull up release/1.2 branch and 1.2.0 tag to build it yourself incase the packages are not available. On Jan 15, 2017 2:55 PM, "Md. Rezaul Karim" wrote: > Hi Ayan, > > Thanks a million. > > Regards, > _ > *Md. Rezaul Karim*, BSc, MSc > PhD Researcher

[jira] [Commented] (SPARK-10078) Vector-free L-BFGS

2017-01-08 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15809876#comment-15809876 ] Debasish Das commented on SPARK-10078: -- I looked into the code and I see we

[jira] [Commented] (SPARK-10078) Vector-free L-BFGS

2017-01-02 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793770#comment-15793770 ] Debasish Das commented on SPARK-10078: -- [~mengxr] [~dlwh] is it possibl

[jira] [Comment Edited] (SPARK-10078) Vector-free L-BFGS

2017-01-02 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793760#comment-15793760 ] Debasish Das edited comment on SPARK-10078 at 1/3/17 12:2

[jira] [Commented] (SPARK-10078) Vector-free L-BFGS

2017-01-02 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15793760#comment-15793760 ] Debasish Das commented on SPARK-10078: -- Ideally feature partitioning shoul

[jira] [Comment Edited] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-12-25 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777650#comment-15777650 ] Debasish Das edited comment on SPARK-13857 at 12/26/16 5:5

[jira] [Commented] (SPARK-13857) Feature parity for ALS ML with MLLIB

2016-12-25 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-13857?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15777650#comment-15777650 ] Debasish Das commented on SPARK-13857: -- item->item and user->user was

SortedSetDocValue vs BinaryDocValues

2016-12-19 Thread Debasish Das
Hi, I need to add col1:Array[String], col2:Array[Int] and col3:Array[Float] to docvalue. col1: Array[String] sparse dimension from OLAP world col2: Array[Int] + Array[Float] represents a sparse vector for sparse measure from OLAP world with dictionary encoding for col1 mapped to col2 I have few

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-10-16 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581366#comment-15581366 ] Debasish Das commented on SPARK-5992: - Also do you have hash function for eucli

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2016-10-16 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581361#comment-15581361 ] Debasish Das commented on SPARK-5992: - Did you compare with brute force

[jira] [Commented] (SPARK-4823) rowSimilarities

2016-10-16 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15581359#comment-15581359 ] Debasish Das commented on SPARK-4823: - We use it in multiple usecases internally

Re: Spark Improvement Proposals

2016-10-16 Thread Debasish Das
Thanks Cody for bringing up a valid point...I picked up Spark in 2014 as soon as I looked into it since compared to writing Java map-reduce and Cascading code, Spark made writing distributed code fun...But now as we went deeper with Spark and real-time streaming use-case gets more prominent, I thin

[jira] [Commented] (SPARK-6932) A Prototype of Parameter Server

2016-08-07 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6932?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15411023#comment-15411023 ] Debasish Das commented on SPARK-6932: - [~rxin] [~sowen] Do we have any other ac

Re: Compute pairwise distance

2016-07-07 Thread Debasish Das
>> (point, distances.filter(_._2 <= kthDistance._2)) >> } >> } >> >> This is part of my Local Outlier Factor implementation. >> >> Of course the distances can be sorted because it is an Iterable, but it >> gives an idea. Is it possi

[jira] [Comment Edited] (SPARK-9834) Normal equation solver for ordinary least squares

2016-06-05 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315935#comment-15315935 ] Debasish Das edited comment on SPARK-9834 at 6/5/16 4:4

[jira] [Commented] (SPARK-9834) Normal equation solver for ordinary least squares

2016-06-05 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=15315935#comment-15315935 ] Debasish Das commented on SPARK-9834: - Do you have runtime comparisons that

Re: simultaneous actions

2016-01-18 Thread Debasish Das
Simultaneous action works on cluster fine if they are independent...on local I never paid attention but the code path should be similar... On Jan 18, 2016 8:00 AM, "Koert Kuipers" wrote: > stacktrace? details? > > On Mon, Jan 18, 2016 at 5:58 AM, Mennour Rostom > wrote: > >> Hi, >> >> I am runni

Re: Using spark MLlib without installing Spark

2015-11-26 Thread Debasish Das
Decoupling mlllib and core is difficult...it is not intended to run spark core 1.5 with spark mllib 1.6 snapshot...core is more stabilized due to new algorithms getting added to mllib and sometimes you might be tempted to do that but its not recommend. On Nov 21, 2015 8:04 PM, "Reynold Xin" wrote:

Re: apply simplex method to fix linear programming in spark

2015-11-04 Thread Debasish Das
le to add. You can add an issue in breeze for the enhancememt. Alternatively you can use breeze lpsolver as well that uses simplex from apache math. On Nov 4, 2015 1:05 AM, "Zhiliang Zhu" wrote: > Hi Debasish Das, > > Firstly I must show my deep appreciation towards you kind he

Re: apply simplex method to fix linear programming in spark

2015-11-03 Thread Debasish Das
t be steering this a bit off topic: does this need the simplex > method? this is just an instance of nonnegative least squares. I don't > think it relates to LDA either. > > Spark doesn't have any particular support for NNLS (right?) or simplex > though. > > On Mon, N

Re: apply simplex method to fix linear programming in spark

2015-11-02 Thread Debasish Das
Use breeze simplex which inturn uses apache maths simplex...if you want to use interior point method you can use ecos https://github.com/embotech/ecos-java-scala ...spark summit 2014 talk on quadratic solver in matrix factorization will show you example integration with spark. ecos runs as jni proc

Re: Running 2 spark application in parallel

2015-10-23 Thread Debasish Das
You can run 2 threads in driver and spark will fifo schedule the 2 jobs on the same spark context you created (executors and cores)...same idea is used for spark sql thriftserver flow... For streaming i think it lets you run only one stream at a time even if you run them on multiple threads on dri

Re: RDD API patterns

2015-09-17 Thread Debasish Das
Rdd nesting can lead to recursive nesting...i would like to know the usecase and why join can't support it...you can always expose an api over a rdd and access that in another rdd mappartition...use a external data source like hbase cassandra redis to support the api... For ur case group by and th

[jira] [Commented] (SPARK-10408) Autoencoder

2015-09-08 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10408?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14735706#comment-14735706 ] Debasish Das commented on SPARK-10408: -- [~avulanov] In MLP can we change BFG

[jira] [Comment Edited] (SPARK-9834) Normal equation solver for ordinary least squares

2015-09-08 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734170#comment-14734170 ] Debasish Das edited comment on SPARK-9834 at 9/8/15 3:1

Re: Spark ANN

2015-09-07 Thread Debasish Das
Not sure dropout but if you change the solver from breeze bfgs to breeze owlqn or breeze.proximal.NonlinearMinimizer you can solve ann loss with l1 regularization which will yield elastic net style sparse solutionsusing that you can clean up edges which has 0.0 as weight... On Sep 7, 2015 7:35

[jira] [Commented] (SPARK-9834) Normal equation solver for ordinary least squares

2015-09-07 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-9834?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734170#comment-14734170 ] Debasish Das commented on SPARK-9834: - If you are open to

[jira] [Commented] (SPARK-10078) Vector-free L-BFGS

2015-09-07 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-10078?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14734130#comment-14734130 ] Debasish Das commented on SPARK-10078: -- [~mengxr] will it be Breeze L

[jira] [Updated] (SPARK-4823) rowSimilarities

2015-07-30 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Debasish Das updated SPARK-4823: Attachment: SparkMeetup2015-Experiments2.pdf SparkMeetup2015-Experiments1.pdf

[jira] [Commented] (SPARK-4823) rowSimilarities

2015-07-30 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-4823?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14648340#comment-14648340 ] Debasish Das commented on SPARK-4823: - We did more detailed experiment for July

Re: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-28 Thread Debasish Das
t; > > Graphically, the access path is as follows: > > > > Spark SQL JDBC Interface -> Spark SQL Parser/Analyzer/Optimizer->Astro > Optimizer-> HBase Scans/Gets -> … -> HBase Region server > > > > > > Regards, > > > > Yan > > >

Re: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-28 Thread Debasish Das
t; > > Graphically, the access path is as follows: > > > > Spark SQL JDBC Interface -> Spark SQL Parser/Analyzer/Optimizer->Astro > Optimizer-> HBase Scans/Gets -> … -> HBase Region server > > > > > > Regards, > > > > Yan > > >

RE: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-27 Thread Debasish Das
Hi Yan, Is it possible to access the hbase table through spark sql jdbc layer ? Thanks. Deb On Jul 22, 2015 9:03 PM, "Yan Zhou.sc" wrote: > Yes, but not all SQL-standard insert variants . > > > > *From:* Debasish Das [mailto:debasish.da...@gmail.com] > *Sent:* Wedn

RE: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-27 Thread Debasish Das
Hi Yan, Is it possible to access the hbase table through spark sql jdbc layer ? Thanks. Deb On Jul 22, 2015 9:03 PM, "Yan Zhou.sc" wrote: > Yes, but not all SQL-standard insert variants . > > > > *From:* Debasish Das [mailto:debasish.da...@gmail.com] > *Sent:* Wedn

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
unt instances. > > > On Sun, Jul 26, 2015 at 9:19 AM, Debasish Das > wrote: > > Yeah, I think the idea of confidence is a bit different than what I am > > looking for using implicit factorization to do document clustering. > > > > I basically need (r_ij - w_ih_j)^

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
10x more > than the latter. It's very heavily skewed to pay attention to the > high-count instances. > > > On Sun, Jul 26, 2015 at 9:19 AM, Debasish Das > wrote: > > Yeah, I think the idea of confidence is a bit different than what I am > > looking for using

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
I will think further but in the current implicit formulation with confidence, looks like I am factorizing a 0/1 matrix with weights 1 + alpha*rating for observed (1) values and 1 for unobserved (0) values. It's a bit different from LSA model. >> On Sun, Jul 26, 2015 at 6:45 AM, D

Re: Confidence in implicit factorization

2015-07-26 Thread Debasish Das
Instead the rating matrix > is the thing being factorized directly. > > On Sun, Jul 26, 2015 at 6:45 AM, Debasish Das > wrote: > > Hi, > > > > Implicit factorization is important for us since it drives recommendation > > when modeling user click/no-click and

Confidence in implicit factorization

2015-07-25 Thread Debasish Das
Hi, Implicit factorization is important for us since it drives recommendation when modeling user click/no-click and also topic modeling to handle 0 counts in document x word matrices through NMF and Sparse Coding. I am a bit confused on this code: val c1 = alpha * math.abs(rating) if (rating > 0

Re: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-22 Thread Debasish Das
Does it also support insert operations ? On Jul 22, 2015 4:53 PM, "Bing Xiao (Bing)" wrote: > We are happy to announce the availability of the Spark SQL on HBase > 1.0.0 release. > http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase > > The main features in this package, dubbed “As

Re: Package Release Annoucement: Spark SQL on HBase "Astro"

2015-07-22 Thread Debasish Das
Does it also support insert operations ? On Jul 22, 2015 4:53 PM, "Bing Xiao (Bing)" wrote: > We are happy to announce the availability of the Spark SQL on HBase > 1.0.0 release. > http://spark-packages.org/package/Huawei-Spark/Spark-SQL-on-HBase > > The main features in this package, dubbed “As

[akka-user] Re: ANNOUNCE: Akka Streams & HTTP 1.0

2015-07-15 Thread Debasish Das
Hi, First of all congratulations on the release of akka-streams and akka-http ! I am writing a service and spray was my initial choice but with akka-http and spray merge I am more inclined to start learning and using akka-http. This service needs to manage a SparkContext and most likely Cassand

Re: Spark application with a RESTful API

2015-07-14 Thread Debasish Das
How do you manage the spark context elastically when your load grows from 1000 users to 1 users ? On Tue, Jul 14, 2015 at 8:31 AM, Hafsa Asif wrote: > I have almost the same case. I will tell you what I am actually doing, if > it > is according to your requirement, then I will love to help y

Re: Few basic spark questions

2015-07-14 Thread Debasish Das
What do you need in sparkR that mllib / ml don't havemost of the basic analysis that you need on stream can be done through mllib components... On Jul 13, 2015 2:35 PM, "Feynman Liang" wrote: > Sorry; I think I may have used poor wording. SparkR will let you use R to > analyze the data, but

Re: Subsecond queries possible?

2015-07-01 Thread Debasish Das
but I'm interested to see how far > it can be pushed. > > Thanks for your help! > > > -- Eric > > On Tue, Jun 30, 2015 at 5:28 PM, Debasish Das > wrote: > >> I got good runtime improvement from hive partitioninp, caching the >> dataset and increasing

Re: Subsecond queries possible?

2015-06-30 Thread Debasish Das
I got good runtime improvement from hive partitioninp, caching the dataset and increasing the cores through repartition...I think for your case generating mysql style indexing will help further..it is not supported in spark sql yet... I know the dataset might be too big for 1 node mysql but do you

Gossip protocol in Master selection

2015-06-28 Thread Debasish Das
Hi, Akka cluster uses gossip protocol for Master election. The approach in Spark right now is to use Zookeeper for high availability. Interestingly Cassandra and Redis clusters are both using Gossip protocol. I am not sure what is the default behavior right now. If the master dies and zookeeper

Re: Velox Model Server

2015-06-24 Thread Debasish Das
Spark JobServer which would allow triggering > re-computation jobs periodically. We currently just run batch > re-computation and reload factors from S3 periodically. > > We then use Elasticsearch to post-filter results and blend content-based > stuff - which I think might be more efficient

Spark SQL 1.3 Exception

2015-06-24 Thread Debasish Das
, 2015 at 12:21 AM, Debasish Das wrote: > Hi, > > I have some impala created parquet tables which hive 0.13.2 can read fine. > > Now the same table when I want to read using Spark SQL 1.3 I am getting > exception class exception that parquet.hive.serde.ParquetHiveSerde not

Re: Velox Model Server

2015-06-24 Thread Debasish Das
Model sizes are 10m x rank, 100k x rank range. For recommendation/topic modeling I can run batch recommendAll and then keep serving the model using a distributed cache but then I can't incorporate per user model re-predict if user feedback is making the current topk stale. I have to wait for next

Re: Velox Model Server

2015-06-22 Thread Debasish Das
he servlet engine probably doesn't matter at all in comparison. On Sat, Jun 20, 2015, 9:40 PM Debasish Das wrote: > After getting used to Scala, writing Java is too much work :-) > > I am looking for scala based project that's using netty at its core (spray > is one example). &g

Re: Velox Model Server

2015-06-20 Thread Debasish Das
Integration of model server with ML pipeline API. On Sat, Jun 20, 2015 at 12:25 PM, Donald Szeto wrote: > Mind if I ask what 1.3/1.4 ML features that you are looking for? > > > On Saturday, June 20, 2015, Debasish Das wrote: > >> After getting used to Scala, writing

Re: Velox Model Server

2015-06-20 Thread Debasish Das
> On Sat, Jun 20, 2015 at 8:00 AM, Charles Earl >> wrote: >> >>> Is velox NOT open source? >>> >>> >>> On Saturday, June 20, 2015, Debasish Das >>> wrote: >>> >>>> Hi, >>>> >>>> The demo

Velox Model Server

2015-06-20 Thread Debasish Das
Hi, The demo of end-to-end ML pipeline including the model server component at Spark Summit was really cool. I was wondering if the Model Server component is based upon Velox or it uses a completely different architecture. https://github.com/amplab/velox-modelserver We are looking for an open s

Velox Model Server

2015-06-20 Thread Debasish Das
Hi, The demo of end-to-end ML pipeline including the model server component at Spark Summit was really cool. I was wondering if the Model Server component is based upon Velox or it uses a completely different architecture. https://github.com/amplab/velox-modelserver We are looking for an open s

Re: Welcoming some new committers

2015-06-20 Thread Debasish Das
Congratulations to All. DB great work in bringing quasi newton methods to Spark ! On Wed, Jun 17, 2015 at 3:18 PM, Chester Chen wrote: > Congratulations to All. > > DB and Sandy, great works ! > > > On Wed, Jun 17, 2015 at 3:12 PM, Matei Zaharia > wrote: > >> Hey all, >> >> Over the past 1.5 m

Impala created parquet tables

2015-06-20 Thread Debasish Das
Hi, I have some impala created parquet tables which hive 0.13.2 can read fine. Now the same table when I want to read using Spark SQL 1.3 I am getting exception class exception that parquet.hive.serde.ParquetHiveSerde not found. I am assuming that hive somewhere is putting the parquet-hive-bundl

Re: Does MLLib has attribute importance?

2015-06-18 Thread Debasish Das
Running l1 and picking non zero coefficient s gives a good estimate of interesting features as well... On Jun 17, 2015 4:51 PM, "Xiangrui Meng" wrote: > We don't have it in MLlib. The closest would be the ChiSqSelector, > which works for categorical data. -Xiangrui > > On Thu, Jun 11, 2015 at 4:3

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Debasish Das
Also in my experiments, it's much faster to blocked BLAS through cartesian rather than doing sc.union. Here are the details on the experiments: https://issues.apache.org/jira/browse/SPARK-4823 On Thu, Jun 18, 2015 at 8:40 AM, Debasish Das wrote: > Also not sure how threading helps here

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Debasish Das
Also not sure how threading helps here because Spark puts a partition to each core. On each core may be there are multiple threads if you are using intel hyperthreading but I will let Spark handle the threading. On Thu, Jun 18, 2015 at 8:38 AM, Debasish Das wrote: > We added SPARK-3066 for t

Re: Matrix Multiplication and mllib.recommendation

2015-06-18 Thread Debasish Das
We added SPARK-3066 for this. In 1.4 you should get the code to do BLAS dgemm based calculation. On Thu, Jun 18, 2015 at 8:20 AM, Ayman Farahat < ayman.fara...@yahoo.com.invalid> wrote: > Thanks Sabarish and Nick > Would you happen to have some code snippets that you can share. > Best > Ayman > >

[jira] [Comment Edited] (SPARK-2336) Approximate k-NN Models for MLLib

2015-06-12 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583886#comment-14583886 ] Debasish Das edited comment on SPARK-2336 at 6/12/15 6:5

[jira] [Commented] (SPARK-2336) Approximate k-NN Models for MLLib

2015-06-12 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-2336?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14583886#comment-14583886 ] Debasish Das commented on SPARK-2336: - Very cool idea Sen. Did you also look

Re: Linear Regression with SGD

2015-06-10 Thread Debasish Das
It's always better to use a quasi newton solver if the runtime and problem scale permits as there are guarantees on opti mization...owlqn and bfgs are both quasi newton Most single node code bases will run quasi newton solvesif you are using sgd better is to use adadelta/adagrad or similar tri

Re: Spark ML decision list

2015-06-07 Thread Debasish Das
What is decision list ? Inorder traversal (or some other traversal) of fitted decision tree On Jun 5, 2015 1:21 AM, "Sateesh Kavuri" wrote: > Is there an existing way in SparkML to convert a decision tree to a > decision list? > > On Thu, Jun 4, 2015 at 10:50 PM, Reza Zadeh wrote: > >> The close

[jira] [Commented] (SPARK-5992) Locality Sensitive Hashing (LSH) for MLlib

2015-06-05 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-5992?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=14575557#comment-14575557 ] Debasish Das commented on SPARK-5992: - Lsh is anyway optimized for cosine...I t

[jira] [Updated] (SPARK-6323) Large rank matrix factorization with Nonlinear loss and constraints

2015-05-28 Thread Debasish Das (JIRA)
[ https://issues.apache.org/jira/browse/SPARK-6323?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Debasish Das updated SPARK-6323: Affects Version/s: (was: 1.4.0) > Large rank matrix factorization with Nonlinear loss

Streaming data + Blocked Model

2015-05-28 Thread Debasish Das
Hi, We want to keep the model created and loaded in memory through Spark batch context since blocked matrix operations are required to optimize on runtime. The data is streamed in through Kafka / raw sockets and Spark Streaming Context. We want to run some prediction operations with the streaming

Re: Help optimizing some spark code

2015-05-26 Thread Debasish Das
You don't need sort...use topbykey if your topk number is less...it uses java heap... On May 24, 2015 10:53 AM, "Tal" wrote: > Hi, > I'm running this piece of code in my program: > > smallRdd.join(largeRdd) > .groupBy { case (id, (_, X(a, _, _))) => a } > .map { case (a, iterable)

Re: GraphX implementation of ALS?

2015-05-26 Thread Debasish Das
In general for implicit feedback in als you have to do a blocked gram matrix calculation which might not fit in graphx flow and lot of blocked operations can be used...but if your loss is likelihood or kl divergence or just simple sgd update rules and not least square then graphx idea makes sense..

Re: Power iteration clustering

2015-05-26 Thread Debasish Das
5:53 PM, "Joseph Bradley" wrote: > That's a good question; I could imagine it being much more efficient if > kept in a BlockMatrix and using BLAS2 ops. > > On Sat, May 23, 2015 at 8:09 PM, Debasish Das > wrote: > >> Hi, >> >> What was the m

Re: Kryo option changed

2015-05-24 Thread Debasish Das
23, 2015 at 6:37 PM, Ted Yu wrote: > >> Pardon me. >> >> Please use '8192k' >> >> Cheers >> >> On Sat, May 23, 2015 at 6:24 PM, Debasish Das >> wrote: >> >>> Tried "8mb"...still I am failing on the s

Re: spark packages

2015-05-24 Thread Debasish Das
, May 23, 2015, Patrick Wendell wrote: >> >>> Yes - spark packages can include non ASF licenses. >>> >>> On Sat, May 23, 2015 at 6:16 PM, Debasish Das >>> wrote: >>> > Hi, >>> > >>> > Is it possible to add GPL/LGPL c

  1   2   3   4   5   6   >