Re: SPIP: Property Graphs, Cypher Queries, and Algorithms

2018-11-13 Thread Xiangrui Meng
+Joseph Gonzalez +Ankur Dave On Tue, Nov 13, 2018 at 2:55 AM Martin Junghanns wrote: > Hi Spark community, > > We would like to propose a new graph module for Apache Spark with support > for Property Graphs, Cypher graph queries and graph algorithms built on top > of the DataFrame API. > >

Looking for spark connector for SQS

2018-11-13 Thread Pawan Gandhi
Hi All, Searched for connector to connect spark with SQS but could not find any. So please provide pointer for the same. Regards Pawan

RE: Looking for spark connector for SQS

2018-11-13 Thread Jagwani, Prakash
Did you try the SQS JMS Client ? https://docs.aws.amazon.com/AWSSimpleQueueService/latest/SQSDeveloperGuide/sqs-java-message-service-jms-client.html Thanks, Prakash Jagwani From: Pawan Gandhi Sent: Tuesday, November 13, 2018 1:14 PM To: dev@spark.apache.org Subject: Looking for spark connector

which classes/methods are considered as private in Spark?

2018-11-13 Thread Wenchen Fan
Hi all, Recently I updated the MiMa exclusion rules, and found MiMa tracks some private classes/methods unexpectedly. Note that, "private" here means that, we have no guarantee about compatibility. We don't provide documents and users need to take the risk when using them. In the API document,

Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Sean Owen
You should find that 'surprisingly public' classes are there because of language technicalities. For example DummySerializerInstance is public because it's a Java class, and can't be used outside its package otherwise. LIkewise I think MiMa just looks at bytecode, and private[spark] classes are

Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Marcelo Vanzin
On Tue, Nov 13, 2018 at 6:26 PM Wenchen Fan wrote: > Recently I updated the MiMa exclusion rules, and found MiMa tracks some > private classes/methods unexpectedly. Could you clarify what you mean here? Mima has some known limitations such as not handling "private[blah]" very well (because that

DataSourceV2 sync tomorrow

2018-11-13 Thread Ryan Blue
Hi everyone, I just wanted to send out a reminder that there’s a DSv2 sync tomorrow at 17:00 PST, which is 01:00 UTC. Here are some of the topics under discussion in the last couple of weeks: - Read API for v2 - see Wenchen’s doc

Re: DataSourceV2 sync tomorrow

2018-11-13 Thread Cody Koeninger
Am I the only one for whom the livestream link didn't work last time? Would like to be able to at least watch the discussion this time around. On Tue, Nov 13, 2018 at 6:01 PM Ryan Blue wrote: > > Hi everyone, > I just wanted to send out a reminder that there’s a DSv2 sync tomorrow at > 17:00

New PySpark test style

2018-11-13 Thread Hyukjin Kwon
Hi all, Lately, https://github.com/apache/spark/pull/23021 is merged, which tries to a big single file that contains all the tests into smaller files. I picked up one example and follow, NumPy. Because the current style looks closer to NumPy structure and looks easier to follow. Please see

Re: DataSourceV2 sync tomorrow

2018-11-13 Thread Arun Mahadevan
IMO, the currentOffset should not be optional. For continuous mode I assume this offset gets periodically check pointed (so mandatory) ? For the micro batch mode the currentOffset would be the start offset for a micro-batch. And if the micro-batch could be executed without knowing the 'latest'

Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Wenchen Fan
> Could you clarify what you mean here? Mima has some known limitations such as not handling "private[blah]" very well Yes that's what I mean. What I want to know here is, which classes/methods we expect them to be private. I think things marked as "private[blabla]" are expected to be private

Re: which classes/methods are considered as private in Spark?

2018-11-13 Thread Reynold Xin
I used to, before each release during the RC phase, go through every single doc page to make sure we don’t unintentionally leave things public. I no longer have time to do that unfortunately. I find that very useful because I always catch some mistakes through organic development. > On Nov 13,

Re: time for Apache Spark 3.0?

2018-11-13 Thread Sean Owen
As far as I know any JIRA that has implications for users is tagged this way but I haven't examined all of them. All that are going in for 3.0 should have it as Fix Version . Most changes won't have a user visible impact. Do you see any that seem to need the tag? Call em out or even fix them by

Re: time for Apache Spark 3.0?

2018-11-13 Thread Matt Cheah
I just added the label to https://issues.apache.org/jira/browse/SPARK-25908. Unsure if there are any others. I’ll look through the tickets and see if there are any that are missing the label. -Matt Cheah From: Sean Owen Date: Tuesday, November 13, 2018 at 12:09 PM To: Matt Cheah Cc:

Re: SPIP: SPARK-25728 Structured Intermediate Representation (Tungsten IR) for generating Java code

2018-11-13 Thread Kazuaki Ishizaki
Hi all, I spend some time to consider great points. Sorry for my delay. I put comments in green into h ttps://docs.google.com/document/d/1Jzf56bxpMpSwsGV_hSzl9wQG22hyI731McQcjognqxY/edit Here are summary of comments: 1) For simplicity and expressiveness, introduce nodes to represent a structure

Re: time for Apache Spark 3.0?

2018-11-13 Thread Matt Cheah
The release-notes label on JIRA sounds good. Can we make it a point to have that done retroactively now, and then moving forward? On 11/12/18, 4:01 PM, "Sean Owen" wrote: My non-definitive takes -- I would personally like to remove all deprecated methods for Spark 3. I