Nice to hear that your experiment is consistent to my assumption. The
current L1/L2 will penalize the intercept as well which is not idea.
I'm working on GLMNET in Spark using OWLQN, and I can exactly get the
same solution as R but with scalability in # of rows and columns. Stay
tuned!
Sincerely,
Hi, apologies if I missed a FAQ somewhere.
I am trying to submit a bug fix for the very first time. Reading
instructions, I forked the git repo (at
c9ae79fba25cd49ca70ca398bc75434202d26a97) and am trying to run tests.
I run this: ./dev/run-tests _SQL_TESTS_ONLY=true
and after a while get the
Hi Spark community
Having spent some time getting up to speed with the various Spark components
in the core package, I've written a blog to help other newcomers and
contributors.
By no means am I a Spark expert so would be grateful for any advice,
comments or edit suggestions.
Thanks very much
_RUN_SQL_TESTS needs to be true as well. Those two _... variables set get
correctly when tests are run on Jenkins. They’re not meant to be
manipulated directly by testers.
Did you want to run SQL tests only locally? You can try faking being
Jenkins by setting AMPLAB_JENKINS=true before calling
For performance, will foreign data format support, same as native ones?
Thanks,
James
On Wed, Oct 8, 2014 at 11:03 PM, Cheng Lian lian.cs@gmail.com wrote:
The foreign data source API PR also matters here
https://www.github.com/apache/spark/pull/2475
Foreign data source like ORC can be
Yes, the foreign sources work is only about exposing a stable set of APIs
for external libraries to link against (to avoid the spark assembly
becoming a dependency mess). The code path these APIs use will be the same
as that for datasources included in the core spark sql library.
Michael
On
Also, in general for SQL only changes it is sufficient to run sbt/sbt
catatlyst/test sql/test hive/test. The hive/test part takes the
longest, so I usually leave that out until just before submitting unless my
changes are hive specific.
On Thu, Oct 9, 2014 at 11:40 AM, Nicholas Chammas
Thanks for the feedback. For 1, there is an open patch:
https://github.com/apache/spark/pull/2659. For 2, broadcast blocks actually use
MEMORY_AND_DISK storage, so they will spill to disk if you have low memory, but
they're faster to access otherwise.
Matei
On Oct 9, 2014, at 12:11 PM,
Oops I forgot to add, for 2, maybe we can add a flag to use DISK_ONLY for
TorrentBroadcast, or if the broadcasts are bigger than some size.
Matei
On Oct 9, 2014, at 3:04 PM, Matei Zaharia matei.zaha...@gmail.com wrote:
Thanks for the feedback. For 1, there is an open patch:
Sounds great, thanks!
On Thu, Oct 9, 2014 at 2:22 PM, Michael Armbrust mich...@databricks.com
wrote:
Yes, the foreign sources work is only about exposing a stable set of APIs
for external libraries to link against (to avoid the spark assembly
becoming a dependency mess). The code path
Does it make sense to point the Spark PR review board to read from
mesos/spark-ec2 as well? PRs submitted against that repo may reference
Spark JIRAs and need review just like any other Spark PR.
Nick
Hi Community,
I use Spark 1.0.2, using Spark SQL to do Hive SQL.
When I run the following code in Spark Shell:
val file = sc.textFile(./README.md)
val count = file.flatMap(line = line.split( )).map(word = (word,
1)).reduceByKey(_+_)
count.collect()
Correct and no error!
Dear Community,
Please ignore my last post about Spark SQL.
When I run:
val file = sc.textFile(./README.md)
val count = file.flatMap(line = line.split( )).map(word = (word,
1)).reduceByKey(_+_)
count.collect()
it happends too.
is there any possible reason for
Hello,
Sorry for the late reply.
When I tried the LogQuery example this time, things now seem to be fine!
...
14/10/10 04:01:21 INFO scheduler.DAGScheduler: Stage 0 (collect at
LogQuery.scala:80) finished in 0.429 s
14/10/10 04:01:21 INFO scheduler.TaskSchedulerImpl: Removed TaskSet 0.0,
14 matches
Mail list logo