date:20170913

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

2017-09-13 Thread Noman Khan

+1(non-binding) Regards Noman From: Xiao Li Sent: Tuesday, September 12, 2017 2:44:26 AM To: Matei Zaharia; Hyukjin Kwon Cc: spark-dev Subject: Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python +1 Xiao On Mon, 11 Sep 2017 at 6:44 PM Matei Zaharia mailto:m

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-13 Thread Michael Armbrust

I think the right way to look at this is the batchId is just a proxy for offsets that is agnostic to what type of source you are reading from (or how many sources their are). We might call into a custom sink with the same batchId more than once, but it will always contain the same data (there is n

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Shivaram Venkataraman

Mark, I agree with your point on the risks of using Cloudfront while building Spark. I was only trying to provide background on when we started using Cloudfront. Personally, I don't have enough about context about the test case in question (e.g. Why are we downloading Spark in a test case ?). Tha

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Mark Hamstra

Yeah, but that discussion and use case is a bit different -- providing a different route to download the final released and approved artifacts that were built using only acceptable artifacts and sources vs. building and checking prior to release using something that is not from an Apache mirror. Th

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Sean Owen

Ah right yeah I know it's an S3 bucket. Thanks for the context. Although I imagine the reasons it was set up no longer apply so much (you can get a direct mirror download link), and so it would probably be possible to retire this, there's also no big rush to. I wasn't clear from the thread whether

Re: What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Shivaram Venkataraman

The bucket comes from Cloudfront, a CDN thats part of AWS. There was a bunch of discussion about this back in 2013 https://lists.apache.org/thread.html/9a72ff7ce913dd85a6b112b1b2de536dcda74b28b050f70646aba0ac@1380147885@%3Cdev.spark.apache.org%3E Shivaram On Wed, Sep 13, 2017 at 9:30 AM, Sean Owe

What is d3kbcqa49mib13.cloudfront.net ?

2017-09-13 Thread Sean Owen

Not a big deal, but Mark noticed that this test now downloads Spark artifacts from the same 'direct download' link available on the downloads page: https://github.com/apache/spark/blob/master/sql/hive/src/test/scala/org/apache/spark/sql/hive/HiveExternalCatalogVersionsSuite.scala#L53 https://d3kb

New to dev community | Contribution to Mlib

2017-09-13 Thread Venali Sonone

Hello, I am new to dev community of Spark and also open source in general but have used Spark extensively. I want to create a complete part on anomaly detection in spark Mlib, For the same I want to know if someone could guide me so i can start the development and contribute to Spark Mlib. Sorry

Re: Minimum cost flow problem solving in Spark

2017-09-13 Thread Michael Malak

You might be interested in "Maximum Flow implementation on Spark GraphX" done by a Colorado School of Mines grad student a couple of years ago. http://datascienceassn.org/2016-01-27-maximum-flow-implementation-spark-graphx From: Swapnil Shinde To: u...@spark.ap

Re: Easy way to get offset metatada with Spark Streaming API

2017-09-13 Thread Dmitry Naumenko

Thanks, I see. However, I guess reading from checkpoint directory might be less efficient comparing just preserving offsets in Dataset. I have one more question about operation idempotence (hope it help others to have a clear picture). If I read offsets on re-start from RDBMS and manually specif

Re: [VOTE][SPIP] SPARK-21190: Vectorized UDFs in Python

Re: Easy way to get offset metatada with Spark Streaming API

Re: What is d3kbcqa49mib13.cloudfront.net ?

Re: What is d3kbcqa49mib13.cloudfront.net ?

Re: What is d3kbcqa49mib13.cloudfront.net ?

Re: What is d3kbcqa49mib13.cloudfront.net ?

What is d3kbcqa49mib13.cloudfront.net ?

New to dev community | Contribution to Mlib

Re: Minimum cost flow problem solving in Spark

Re: Easy way to get offset metatada with Spark Streaming API

10 matches

Site Navigation

Mail list logo

Footer information