Hi,
I've been working on this issue, and I would like to get your feedback on
the following approach. The idea is that instead of failing in
`TaskSetManager.abortIfCompletelyBlacklisted`, when a task cannot be
scheduled in any executor but dynamic allocation is enabled, we will
register this task
Hi,
Are there any comments or suggestions regarding this proposal?
Thanks,
Juan
On Mon, Oct 16, 2017 at 10:27 AM, Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com> wrote:
> Hi all,
>
> I have a prototype for "Keep track of nodes which are going to be shut
>
Hi all,
I have a prototype for "Keep track of nodes which are going to be shut down
& avoid scheduling new tasks" (
https://issues.apache.org/jira/browse/SPARK-20628) that I would like to
discuss with the community. I added a WIP PR for that in
https://github.com/apache/spark/pull/19267. The
Hi Sim,
I understand that what you propose is defining a trait SparkIterable (and
also PairSparkIterable for RDDs of pairs) that encapsulates the methods in
RDDs, and then program using that trait instead of RDD. That is similar to
programming using scala.collection.GenSeq to abstract from using
Hi,
Sorry to insist, anyone has any thoughts on this? Or at least someone can
point me to a documentation of DStream.compute() so I can understand when I
should return None for a batch?
Thanks
Juan
2015-09-14 20:51 GMT+02:00 Juan Rodríguez Hortalá <
juan.rodriguez.hort...@gmail.com>:
Hi,
I sent this message to the user list a few weeks ago with no luck, so I'm
forwarding it to the dev list in case someone could give a hand with this.
Thanks a lot in advance
I've developed a ScalaCheck property for testing Spark Streaming
transformations. To do that I had to develop a custom
Hi,
Maybe you could use zipWithIndex and filter to skip the first elements. For
example starting from
scala> sc.parallelize(100 to 120, 4).zipWithIndex.collect
res12: Array[(Int, Long)] = Array((100,0), (101,1), (102,2), (103,3),
(104,4), (105,5), (106,6), (107,7), (108,8), (109,9), (110,10),
will save time on each repetition.
I will also do some experiments. About little repetitions: in what use
cases we will lose efficiency? it will also test it.
What I need to do this commitment? Just create ticket in Jira?
2015-07-19 21:56 GMT+03:00 Juan Rodríguez Hortalá
juan.rodriguez.hort
Hi,
My two cents is that that could be interesting if all RDD and pair
RDD operations would be lifted to work on groupedRDD. For example as
suggested a map on grouped RDDs would be more efficient if the original RDD
had lots of duplicate entries, but for RDDs with little repetitions I guess
you
Hi,
You can connect to by JDBC as described in
https://spark.apache.org/docs/latest/sql-programming-guide.html#jdbc-to-other-databases.
Other option is using HadoopRDD and NewHadoopRDD to connect to databases
compatible with Hadoop, like HBase, some examples can be found at chapter 5
of Learning
Hi,
If you want I would be happy to work in this. I have worked with
KafkaUtils.createDirectStream before, in a pull request that wasn't
accepted https://github.com/apache/spark/pull/5367. I'm fluent with Python
and I'm starting to feel comfortable with Scala, so if someone opens a JIRA
I can
Hi,
You can use the method repartition from DStream (for the Scala API) or
JavaDStream (for the Java API)
defrepartition(numPartitions: Int): DStream
https://spark.apache.org/docs/latest/api/scala/org/apache/spark/streaming/dstream/DStream.html
[T]
Return a new DStream with an increased or
into separate RDDs.
On Wed, Apr 29, 2015 at 2:10 PM, Juan Rodríguez Hortalá
juan.rodriguez.hort...@gmail.com wrote:
Hi Sébastien,
I came with a similar problem some time ago, you can see the discussion in
the Spark users mailing list at
http://markmail.org/message/fudmem4yy63p62ar#query:+page:1
13 matches
Mail list logo