Re: Why spark-submit works with package not with jar

2024-05-06 Thread David Rabinowitz
Hi, It seems this library is several years old. Have you considered using the Google provided connector? You can find it in https://github.com/GoogleCloudDataproc/spark-bigquery-connector Regards, David Rabinowitz On Sun, May 5, 2024 at 6:07 PM Jeff Zhang wrote: > Are you s

Re: JDK version support policy?

2023-06-13 Thread David Li
we're >>>> going to see enough folks moving to JRE17 by the Spark 4 release unless we >>>> have a strong benefit from dropping 11 support I'd be inclined to keep it. >>>> >>>> On Tue, Jun 6, 2023 at 9:08 PM Dongjoon Hyun wrote: >>>>> I'm al

JDK version support policy?

2023-06-06 Thread David Li
Hello Spark developers, I'm from the Apache Arrow project. We've discussed Java version support [1], and crucially, whether to continue supporting Java 8 or not. As Spark is a big user of Arrow in Java, I was curious what Spark's policy here was. If Spark intends to stay on Java 8, for

SPARK-22256

2020-12-11 Thread David McWhorter
Hello, my name is David McWhorter and I created a new pull request to address the SPARK-22256 ticket at https://github.com/apache/spark/pull/30739. This change adds a memory overhead setting for the spark driver running on mesos. This is a reopening of a prior pull request that was never merged

Unsubscribe

2020-12-08 Thread David Zhou
Unsubscribe

[DISCUSS] Reducing memory usage of toPandas with Arrow "self_destruct" option

2020-09-10 Thread David Li
er [3]: https://github.com/pandas-dev/pandas/issues/35530 [*] See my comment in https://issues.apache.org/jira/browse/ARROW-9878. Thanks, David - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org

Spark 3.0 and ORC 1.6

2020-01-28 Thread David Christle
dependence on Hadoop 2.9 is not required). Again, these may be non-issues, but I wanted to kindle discussion around whether this can make the cut for 3.0, since I imagine it’s a major upgrade many users will focus on migrating to once released. Kind regards, David Christle

Re: [Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-30 Thread David Vogelbacher
e default for spark.kubernetes.executor.cores would be. Seeing that I wanted more than 1 and Yinan wants less, leaving it at 1 night be best. Thanks, David From: Kimoon Kim <kim...@pepperdata.com> Date: Friday, March 30, 2018 at 4:28 PM To: Yinan Li <liyinan...@gmail.com> Cc: David V

[Kubernetes] Resource requests and limits for Driver and Executor Pods

2018-03-29 Thread David Vogelbacher
, David smime.p7s Description: S/MIME cryptographic signature

RE: Launching multiple spark jobs within a main spark job.

2016-12-21 Thread David Hodeffi
I am not familiar of any problem with that. Anyway, If you run spark applicaction you would have multiple jobs, which makes sense that it is not a problem. Thanks David. From: Naveen [mailto:hadoopst...@gmail.com] Sent: Wednesday, December 21, 2016 9:18 AM To: dev@spark.apache.org; u

Re: SPARK-13843 and future of streaming backends

2016-03-25 Thread David Nalley
> As far as group / artifact name compatibility, at least in the case of > Kafka we need different artifact names anyway, and people are going to > have to make changes to their build files for spark 2.0 anyway. As > far as keeping the actual classes in org.apache.spark to not break > code

Re: [ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-02 Thread David Russell
o get rid of all cluster > management when using Spark. You might find one of the hosted Spark platform solutions such as Databricks or Amazon EMR that handle cluster management for you a good place to start. At least in my experience, they got me up and running

[ANNOUNCE] New SAMBA Package = Spark + AWS Lambda

2016-02-01 Thread David Russell
> ROSE Spark Package: https://github.com/onetapbeyond/opencpu-spark-executor <https://github.com/onetapbeyond/opencpu-spark-executor> Questions, suggestions, feedback welcome. David -- "*All that is gold does not glitter,** Not all those who wander are lost."*

Re: ROSE: Spark + R on the JVM.

2016-01-13 Thread David Russell
n Java, JavaScript and .NET that can easily support your use case. The outputs of your DeployR integration could then become inputs to your data processing system. David "All that is gold does not glitter, Not all those who wander are lost." Original Message Subject: Re:

Re: Eigenvalue solver

2016-01-12 Thread David Hall
anker/blob/71b0ff3989d5191dc6a78c40c4a7a9967cbb0e49/venv/lib/python2.7/site-packages/scipy/sparse/linalg/eigen/arpack/arpack.py#L1049 ) I'm happy to help more if you decide to go this route, here, or on the scala-breeze google group, or on github. -- David On Tue, Jan 12, 2016 at 10:28 AM, Lydia Ickler <ickle...@googlemail.com&

ROSE: Spark + R on the JVM.

2016-01-12 Thread David
to [take a look](https://github.com/onetapbeyond/opencpu-spark-executor). Any feedback, questions etc very welcome. David "All that is gold does not glitter, Not all those who wander are lost."

Re: ROSE: Spark + R on the JVM.

2016-01-12 Thread David Russell
Hi Corey, > Would you mind providing a link to the github? Sure, here is the github link you're looking for: https://github.com/onetapbeyond/opencpu-spark-executor David "All that is gold does not glitter, Not all those who wander are lost." Original Message ---

Re: [discuss] dropping Python 2.6 support

2016-01-11 Thread David Chin
>>> Python 2.6 is ancient, and is pretty slow in many aspects (e.g. json >>> parsing) when compared with Python 2.7. Some libraries that Spark depend on >>> stopped supporting 2.6. We can still convince the library maintainers to >>> support 2.6, but it will be extra

Differing performance in self joins

2015-08-26 Thread David Smith
, David -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Differing-performance-in-self-joins-tp13864.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com

Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-18 Thread David Hall
sure. On Wed, Mar 18, 2015 at 12:19 AM, Debasish Das debasish.da...@gmail.com wrote: Hi David, We are stress testing breeze.optimize.proximal and nnls...if you are cutting a release now, we will need another release soon once we get the runtime optimizations in place and merged to breeze

Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-17 Thread David Hall
ping? On Sun, Mar 15, 2015 at 9:38 PM, David Hall david.lw.h...@gmail.com wrote: snapshot is pushed. If you verify I'll publish the new artifacts. On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: David Hall who is a breeze creator told me that it's a bug

Re: [mllib] Is there any bugs to divide a Breeze sparse vectors at Spark v1.3.0-rc3?

2015-03-15 Thread David Hall
snapshot is pushed. If you verify I'll publish the new artifacts. On Sun, Mar 15, 2015 at 1:14 AM, Yu Ishikawa yuu.ishikawa+sp...@gmail.com wrote: David Hall who is a breeze creator told me that it's a bug. So, I made a jira ticket about this issue. We need to upgrade breeze from 0.11.1

Re: Implementing TinkerPop on top of GraphX

2015-01-15 Thread David Robinson
I am new to Spark and GraphX, however, I use Tinkerpop backed graphs and think the idea of using Tinkerpop as the API for GraphX is a great idea and hope you are still headed in that direction. I noticed that Tinkerpop 3 is moving into the Apache family:

Re: spark-yarn_2.10 1.2.0 artifacts

2014-12-22 Thread David McWhorter
Thank you, Sean, using spark-network-yarn seems to do the trick. On 12/19/2014 12:13 PM, Sean Owen wrote: I believe spark-yarn does not exist from 1.2 onwards. Have a look at spark-network-yarn for where some of that went, I believe. On Fri, Dec 19, 2014 at 5:09 PM, David McWhorter mcwhor

spark-yarn_2.10 1.2.0 artifacts

2014-12-19 Thread David McWhorter
or insights into how to use spark-yarn_2.10 1.2.0 in a maven build would be appreciated. David -- David McWhorter Software Engineer Commonwealth Computer Research, Inc. 1422 Sachem Place, Unit #1 Charlottesville, VA 22901 mcwhor...@ccri.com | 434.299.0090x204

Re: EC2 clusters ready in launch time + 30 seconds

2014-10-06 Thread David Rowe
how these projects turn out. David, Packer looks very, very interesting. I'm gonna look into it more next week. Nick On Thu, Oct 2, 2014 at 8:00 PM, Nate D'Amico n...@reactor8.com wrote: Bit of progress on our end, bit of lagging as well. Our guy

Re: Breeze Library usage in Spark

2014-10-03 Thread David Hall
yeah, breeze.storage.Zero was introduced in either 0.8 or 0.9. On Fri, Oct 3, 2014 at 9:45 AM, Xiangrui Meng men...@gmail.com wrote: Did you add a different version of breeze to the classpath? In Spark 1.0, we use breeze 0.7, and in Spark 1.1 we use 0.9. If the breeze version you used is

Re: BlockManager issues

2014-09-22 Thread David Rowe
I've run into this with large shuffles - I assumed that there was contention between the shuffle output files and the JVM for memory. Whenever we start getting these fetch failures, it corresponds with high load on the machines the blocks are being fetched from, and in some cases complete

Source code for mining big data with Spark

2014-09-14 Thread David Tung
Hi all, I watched am impressed spark demo video by Reynold Xin and Aaron Davidson in youtube ( https://www.youtube.com/watch?v=FjhRkfAuU7I ). Can someone let me know where can I find the source codes for the demo? I can¹t see the source codes from video clearly. Thanks in advance

Re: Is breeze thread safe in Spark?

2014-09-03 Thread David Hall
mutating operations are not thread safe. Operations that don't mutate should be thread safe. I can't speak to what Evan said, but I would guess that the way they're using += should be safe. On Wed, Sep 3, 2014 at 11:58 AM, RJ Nowling rnowl...@gmail.com wrote: David, Can you confirm

Re: Linear CG solver

2014-06-27 Thread David Hall
I have no ideas on benchmarks, but breeze has a CG solver: https://github.com/scalanlp/breeze/tree/master/math/src/main/scala/breeze/optimize/linear/ConjugateGradient.scala

Re: mllib vector templates

2014-05-05 Thread David Hall
On Mon, May 5, 2014 at 3:40 PM, DB Tsai dbt...@stanford.edu wrote: David, Could we use Int, Long, Float as the data feature spaces, and Double for optimizer? Yes. Breeze doesn't allow operations on mixed types, so you'd need to convert the double vectors to Floats if you wanted, e.g. dot

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-29 Thread David Hall
://jmlr.org/proceedings/papers/v2/schraudolph07a/schraudolph07a.pdf -- David On Tue, Apr 29, 2014 at 3:30 PM, DB Tsai dbt...@stanford.edu wrote: Have a quick hack to understand the behavior of SLBFGS (Stochastic-LBFGS) by overwriting the breeze iterations method to get the current LBFGS step

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-28 Thread David Hall
That's right. FWIW, caching should be automatic now, but it might be the version of Breeze you're using doesn't do that yet. Also, In breeze.util._ there's an implicit that adds a tee method to iterator, and also a last method. Both are useful for things like this. -- David On Sun, Apr 27

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-25 Thread David Hall
regularizing, are you including the regularizer in the objective value computation? GD is almost never worth your time. -- David On Fri, Apr 25, 2014 at 2:57 PM, DB Tsai dbt...@stanford.edu wrote: Another interesting benchmark. *News20 dataset - 0.14M row, 1,355,191 features, 0.034% non-zero

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
remember what the problems were? I'm sure it could be better, but it's good to know where improvements need to happen. -- David On Apr 23, 2014, at 9:21 PM, DB Tsai dbt...@stanford.edu wrote: Hi all, I'm benchmarking Logistic Regression in MLlib using the newly added optimizer LBFGS

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
Was the weight vector sparse? The gradients? Or just the feature vectors? On Wed, Apr 23, 2014 at 10:08 PM, DB Tsai dbt...@dbtsai.com wrote: The figure showing the Log-Likelihood vs Time can be found here.

Re: MLlib - logistic regression with GD vs LBFGS, sparse vs dense benchmark result

2014-04-23 Thread David Hall
, DB Tsai --- My Blog: https://www.dbtsai.com LinkedIn: https://www.linkedin.com/in/dbtsai On Wed, Apr 23, 2014 at 10:16 PM, David Hall d...@cs.berkeley.edu wrote: Was the weight vector sparse? The gradients? Or just the feature

Re: RFC: varargs in Logging.scala?

2014-04-11 Thread David Hall
Another usage that's nice is: logDebug { val timeS = timeMillis/1000.0 sTime: $timeS } which can be useful for more complicated expressions. On Thu, Apr 10, 2014 at 5:55 PM, Michael Armbrust mich...@databricks.comwrote: BTW... You can do calculations in string interpolation: sTime:

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-30 Thread David Hall
On Sun, Mar 30, 2014 at 2:01 PM, Debasish Das debasish.da...@gmail.comwrote: Hi David, I have started to experiment with BFGS solvers for Spark GLM over large scale data... I am also looking to add a good QP solver in breeze that can be used in Spark ALS for constraint solves...More

Re: Making RDDs Covariant

2014-03-22 Thread David Hall
have implicit collisions and philosophical concerns about what it means to serialize/deserialize a Parent class and a Child class... You should (almost) never make a typeclass param contravariant. It's almost certainly not what you want: https://issues.scala-lang.org/browse/SI-2509 -- David

graphx samples in Java

2014-03-21 Thread David Soroko
)), (7L, (jgonzal, postdoc)), (5L, (franklin, prof)), (2L, (istoica, prof thanks --david

Code documentation

2014-03-15 Thread David Thomas
Is there any documentation available that explains the code architecture that can help a new Spark framework developer?

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-06 Thread David Hall
On Thu, Mar 6, 2014 at 4:21 PM, DB Tsai dbt...@alpinenow.com wrote: Hi David, I can converge to the same result with your breeze LBFGS and Fortran implementations now. Probably, I made some mistakes when I tried breeze before. I apologize that I claimed it's not stable. See the test case

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread David Hall
On Wed, Mar 5, 2014 at 8:50 AM, Debasish Das debasish.da...@gmail.comwrote: Hi David, Few questions on breeze solvers: 1. I feel the right place of adding useful things from RISO LBFGS (based on Professor Nocedal's fortran code) will be breeze. It will involve stress testing breeze LBFGS

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread David Hall
On Wed, Mar 5, 2014 at 1:57 PM, DB Tsai dbt...@alpinenow.com wrote: Hi David, On Tue, Mar 4, 2014 at 8:13 PM, dlwh david.lw.h...@gmail.com wrote: I'm happy to help fix any problems. I've verified at points that the implementation gives the exact same sequence of iterates for a few

Re: MLLib - Thoughts about refactoring Updater for LBFGS?

2014-03-05 Thread David Hall
I did not. They would be nice to have. On Wed, Mar 5, 2014 at 5:21 PM, Debasish Das debasish.da...@gmail.comwrote: David, There used to be standard BFGS testcases in Professor Nocedal's package...did you stress test the solver with them ? If not I will shoot him an email for them