I got your checkinI need to run logistic regression SGD vs BFGS for my
current usecases but your next checkin will update the logistic regression
with LBFGS right ? Are you adding it to regression package as well ?
Thanks.
Deb
On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai dbt...@stanford.edu
By the way...what's the idea...the labeled data set is a RDD which is
cached on all nodes..
The bfgs solver is maintained on the master or each worker is supposed to
maintain it's own bfgs...
On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das debasish.da...@gmail.comwrote:
I got your checkinI
Matei's link seems to point to a specific starter project as part of the
starter list, but here is the list itself:
https://issues.apache.org/jira/issues/?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)
On Mon, Apr 7,
Ha ha! nice try, sheepherder! ;-)
On Tue, Apr 8, 2014 at 12:37 PM, Matei Zaharia matei.zaha...@gmail.comwrote:
Shh, maybe I really wanted people to fix that one issue.
On Apr 8, 2014, at 9:34 AM, Aaron Davidson ilike...@gmail.com wrote:
Matei's link seems to point to a specific starter
Hi,
Is Graphx on top of Apache Spark, is able to process the large scale
distributed graph traversal and compute, in real time. What is the query
execution engine distributing the query on top of graphx and apache spark.
My typical use case is a large scale distributed graph traversal in real
GraphX, like Spark, will not typically be real-time (where by real-time
here I assume you mean of the order of a few 10s-100s ms, up to a few
seconds).
Spark can in some cases approach the upper boundary of this definition (a
second or two, possibly less) when data is cached in memory and the
Hi,
I am able to read a custom input format in spark.
scala val inputRead = sc.newAPIHadoopFile(hdfs://
127.0.0.1/user/cloudera/date_dataset/
,classOf[io.reader.PatternInputFormat],classOf[org.apache.hadoop.io.LongWritable],classOf[org.apache.hadoop.io.Text])
However, doing a
inputRead.count()
Are you using the PatternInputFormat from this blog post?
https://hadoopi.wordpress.com/2013/05/31/custom-recordreader-processing-string-pattern-delimited-records/
If so you need to set the pattern in the configuration before attempting to
read data with that InputFormat:
String regex =
it all depends on what kind of traversing. if its point traversing then a
random access based something would be great.
if its more scan-like traversl then spark will fit
On Tue, Apr 8, 2014 at 4:56 PM, Evan Chan e...@ooyala.com wrote:
I doubt Titan would be able to give you traversal of
Likely neither will give real-time for full-graph traversal, no. And once
in memory, GraphX would definitely be faster for breadth-first traversal.
But for vertex-centric traversals (starting from a vertex and traversing
edges from there, such as friends of friends queries etc) then Titan is
Nick and Koert summarized it pretty well. Just to clarify and give some
concrete examples.
If you want to start with a specific vertex, and follow some path, it is
probably easier and faster to use some key values store or even MySQL or a
graph database.
If you want to count the average length
Hi Debasish,
The L-BFGS solver will be in the master like GD solver, and the part
that is parallelized is computing the gradient of each input row, and
summing them up.
I prefer to make the optimizer plug-able instead of adding new
LogisticRegressionWithLBFGS since 98% of the code will be the
Yup that's what I expected...L-BFGS solver is in the master and gradient
computation per RDD is done on each of the workers...
This miniBatchFraction is also a heuristic which I don't think makes sense
for LogisticRegressionWithBFGS...does it ?
On Tue, Apr 8, 2014 at 3:44 PM, DB Tsai
I think mini batch is still useful for L-BFGS.
One of the use-cases can be initialized the weights by training with
the smaller subsamples of data using mini batch with L-BFGS.
Then we could use the weights trained with mini batch to start another
training process with full data.
Sincerely,
DB
Have you experimented with it ? For logistic regression at least given
enough iterations/tolerance that you are giving, BFGS in both ways should
converge to same solution
On Tue, Apr 8, 2014 at 4:19 PM, DB Tsai dbt...@stanford.edu wrote:
I think mini batch is still useful for L-BFGS.
One
I don't experiment it. That's the use-case in theory I could think of. ^^
However, from what I saw, BFGS converges really fast so that I only
need 20~30 iterations in general.
Sincerely,
DB Tsai
---
My Blog: https://www.dbtsai.com
LinkedIn:
16 matches
Mail list logo