By the way...what's the idea...the labeled data set is a RDD which is
cached on all nodes..
The bfgs solver is maintained on the master or each worker is supposed to
maintain it's own bfgs...
On Mon, Apr 7, 2014 at 11:23 PM, Debasish Das wrote:
> I got your checkinI need to run logistic reg
I got your checkinI need to run logistic regression SGD vs BFGS for my
current usecases but your next checkin will update the logistic regression
with LBFGS right ? Are you adding it to regression package as well ?
Thanks.
Deb
On Mon, Apr 7, 2014 at 7:00 PM, DB Tsai wrote:
> Hi guys,
>
> T
I’d suggest looking for the issues labeled “Starter” on JIRA. You can find them
here:
https://issues.apache.org/jira/browse/SPARK-1438?jql=project%20%3D%20SPARK%20AND%20labels%20%3D%20Starter%20AND%20status%20in%20(Open%2C%20%22In%20Progress%22%2C%20Reopened)
Matei
On Apr 7, 2014, at 9:45 PM, M
Hi Sujeet,
Thanks. I went thru the website and looks great. Is there a list of
items that I can choose from, for contribution?
Thanks
Mukesh
On Mon, Apr 7, 2014 at 10:14 PM, Sujeet Varakhedi
wrote:
> This is a good place to start:
> https://cwiki.apache.org/confluence/display/SPARK/Contri
Hi guys,
The latest PR uses Breeze's L-BFGS implement which is introduced by
Xiangrui's sparse input format work in SPARK-1212.
https://github.com/apache/spark/pull/353
Now, it works with the new sparse framework!
Any feedback would be greatly appreciated.
Thanks.
Sincerely,
DB Tsai
Cool. I'll look at making the code change in FlumeUtils and generating a
pull request.
As far as the use case, the volume of messages we have is currently about
30 MB per second which may grow to over what a 1 Gbit network adapter can
handle.
- Christophe
On Apr 7, 2014 1:51 PM, "Michael Ernest"
I don't see why not. If one were doing something similar with straight
Flume, you'd start an agent on each node you care to receive Avro/RPC
events. In the absence of clearer insight to your use case, I'm puzzling
just a little why it's necessary for each Worker to be its own receiver,
but there's
Could it be as simple as just changing FlumeUtils to accept a list of
host/port number pairs to start the RPC servers on?
On 4/7/14, 12:58 PM, Christophe Clapp wrote:
Based on the source code here:
https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/strea
Based on the source code here:
https://github.com/apache/spark/blob/master/external/flume/src/main/scala/org/apache/spark/streaming/flume/FlumeUtils.scala
It looks like in its current version, FlumeUtils does not support
starting an Avro RPC server on more than one worker.
- Christophe
On 4/7
Right, but at least in my case, no avro RPC server was started on any of
the spark worker nodes except for one. I don't know if that's just some
configuration issue with my setup or if it's expected behavior. I would
need spark to start avro RPC servers on every worker rather than just one.
- Chri
You can configure your sinks to write to one or more Avro sources in a
load-balanced configuration.
https://flume.apache.org/FlumeUserGuide.html#flume-sink-processors
mfe
On Mon, Apr 7, 2014 at 3:19 PM, Christophe Clapp
wrote:
> Hi,
>
> From my testing of Spark Streaming with Flume, it seems t
Hi,
From my testing of Spark Streaming with Flume, it seems that there's
only one of the Spark worker nodes that runs a Flume Avro RPC server to
receive messages at any given time, as opposed to every Spark worker
running an Avro RPC server to receive messages. Is this the case? Our
use-case
Hi Deb,
It would be helpful if you can attached the logs. It is strange to see
that you can make 4 iterations but not 10.
Xiangrui
On Mon, Apr 7, 2014 at 10:36 AM, Debasish Das wrote:
> I am using master...
>
> No negative indexes...
>
> If I run with 4 iterations it runs fine and I can generat
I agree these should be disabled right away, and the JIRA can be used to
track fixing / turning them back on.
On Mon, Apr 7, 2014 at 11:33 AM, Michael Armbrust wrote:
> There is a JIRA for one of the flakey tests here:
> https://issues.apache.org/jira/browse/SPARK-1409
>
>
> On Mon, Apr 7, 2014
Yes, I will take a look at those tests ASAP.
TD
On Mon, Apr 7, 2014 at 11:32 AM, Patrick Wendell wrote:
> TD - do you know what is going on here?
>
> I looked into this ab it and at least a few of these that use
> Thread.sleep() and assume the sleep will be exact, which is wrong. We
> should
There is a JIRA for one of the flakey tests here:
https://issues.apache.org/jira/browse/SPARK-1409
On Mon, Apr 7, 2014 at 11:32 AM, Patrick Wendell wrote:
> TD - do you know what is going on here?
>
> I looked into this ab it and at least a few of these that use
> Thread.sleep() and assume the
TD - do you know what is going on here?
I looked into this ab it and at least a few of these that use
Thread.sleep() and assume the sleep will be exact, which is wrong. We
should disable all the tests that do and probably they should be re-written
to virtualize time.
- Patrick
On Mon, Apr 7, 20
I met this issue when Jenkins seems to be very busy
On Monday, April 7, 2014, Kay Ousterhout wrote:
> Hi all,
>
> The InputStreamsSuite seems to have some serious flakiness issues -- I've
> seen the file input stream fail many times and now I'm seeing some actor
> input stream test failures
Hi all,
The InputStreamsSuite seems to have some serious flakiness issues -- I've
seen the file input stream fail many times and now I'm seeing some actor
input stream test failures (
https://amplab.cs.berkeley.edu/jenkins/job/SparkPullRequestBuilder/13846/consoleFull)
on what I think is an unrela
I am using master...
No negative indexes...
If I run with 4 iterations it runs fine and I can generate factors...
With 10 iterations run fails with array index out of bound...
25m users and 3m products are within int limits
Does it help if I can point the logs for both the runs to you ?
I
Hi Deb,
This thread is for the out-of-bound error you described. I don't think
the number of iterations has any effect here. My questions were:
1) Are you using the master branch or a particular commit?
2) Do you have negative or out-of-integer-range user or product ids?
Try to print out the max
This is a good place to start:
https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark
Sujeet
On Mon, Apr 7, 2014 at 9:20 AM, Mukesh G wrote:
> Hi,
>
>How I contribute to Spark and it's associated projects?
>
> Appreciate the help...
>
> Thanks
>
> Mukesh
>
Hi,
How I contribute to Spark and it's associated projects?
Appreciate the help...
Thanks
Mukesh
Tachyon is Java 6 compatible from version 0.4. Beside putting input/output
data in Tachyon ( http://tachyon-project.org/Running-Spark-on-Tachyon.html ),
Spark applications can also persist data into Tachyon (
https://github.com/apache/spark/blob/master/docs/scala-programming-guide.md
).
On Mon, A
i noticed there is a dependency on tachyon in spark core 1.0.0-SNAPSHOT.
how does that work? i believe tachyon is written in java 7, yet spark
claims to be java 6 compatible.
Nick,
I already have this code which calls dictionary generation and then maps
string etc to ints...I think the core algorithm should stay in ints...if
you like I can add this code in MFUtils.scalathat's the convention I
followed similar to MLUtils.scala...actually these functions should be ev
On the partitioning / id keys. If we would look at hash partitioning, how
feasible will it be to just allow the user and item ids to be strings? A
lot of the time these ids are strings anyway (UUIDs and so on), and it's
really painful to translate between String <-> Int the whole time.
Are there a
27 matches
Mail list logo