+1(non-binding)
Regards
Noman
From: Xiao Li
Sent: Tuesday, September 12, 2017 2:44:26 AM
To: Matei Zaharia; Hyukjin Kwon
Cc: spark-dev
Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python
+1
Xiao
On Mon, 11 Sep 2017 at 6:44 PM Matei Zaharia
mailto:m
I think the right way to look at this is the batchId is just a proxy for
offsets that is agnostic to what type of source you are reading from (or
how many sources their are). We might call into a custom sink with the
same batchId more than once, but it will always contain the same data
(there is n
Mark, I agree with your point on the risks of using Cloudfront while
building Spark. I was only trying to provide background on when we
started using Cloudfront.
Personally, I don't have enough about context about the test case in
question (e.g. Why are we downloading Spark in a test case ?).
Tha
Yeah, but that discussion and use case is a bit different -- providing a
different route to download the final released and approved artifacts that
were built using only acceptable artifacts and sources vs. building and
checking prior to release using something that is not from an Apache
mirror. Th
Ah right yeah I know it's an S3 bucket. Thanks for the context. Although I
imagine the reasons it was set up no longer apply so much (you can get a
direct mirror download link), and so it would probably be possible to
retire this, there's also no big rush to. I wasn't clear from the thread
whether
The bucket comes from Cloudfront, a CDN thats part of AWS. There was a
bunch of discussion about this back in 2013
https://lists.apache.org/thread.html/9a72ff7ce913dd85a6b112b1b2de536dcda74b28b050f70646aba0ac@1380147885@%3Cdev.spark.apache.org%3E
Shivaram
On Wed, Sep 13, 2017 at 9:30 AM, Sean Owe
Not a big deal, but Mark noticed that this test now downloads Spark
artifacts from the same 'direct download' link available on the downloads
page:
https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala#L53
https://d3kb
Hello,
I am new to dev community of Spark and also open source in general but have
used Spark extensively.
I want to create a complete part on anomaly detection in spark Mlib,
For the same I want to know if someone could guide me so i can start the
development and contribute to Spark Mlib.
Sorry
You might be interested in "Maximum Flow implementation on Spark GraphX" done
by a Colorado School of Mines grad student a couple of years ago.
http://datascienceassn.org/2016-01-27-maximum-flow-implementation-spark-graphx
From: Swapnil Shinde
To: u...@spark.ap
Thanks, I see.
However, I guess reading from checkpoint directory might be less efficient
comparing just preserving offsets in Dataset.
I have one more question about operation idempotence (hope it help others
to have a clear picture).
If I read offsets on re-start from RDBMS and manually specif
10 matches
Mail list logo