Re: Handling questions in the mailing lists
Besides the traffic eventual issue, I don't believe that it would benefit users to get a standalone site. Some great answers are provided by users that aren't spark experts but maybe java, python, aws or even some system experts why do we want to play alone ? We are trying nevertheless the animate the apache spark chat room which isn't as obvious as one might want it to be. I'd rather things stay the way they are on SO. There is a bunch of us that actually are very active and answer as much as we can and we'll be glad to help. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20012.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org
RE: Handling questions in the mailing lists
…my 0.1 cent ☺ As a Spark and SO user, I would not find a separate SE a good thing. *Part of the SO beauty is that you can filter easily and track different topics from one dashboard. *Being part of SO also gets good exposure as it raises awareness of Spark across a wider audience. *High reputation users, even if they are say “python centric”, add value by moderating/commenting. *I don’t think Spark-specific is a good thing either. Spark is typically combined with a huge range of other technologies (Avro, Parquet, Hadoop, Python, R, Scala, Akka, Java, HBase to name a few). Users that are specialists in these topics can provide value and help build quality in Spark tag. By getting a new SE you kind of exclude them. *It will take time to build enough reputable users to share the moderation burden *A high-rep Java user is likely to ask a good question. Forcing people to join an SE with rep being reset you will lose the ability to track your user (and may I say potential Evangelists) quality. By observation(no idea if true), questions by high-rep users attract much better attention than any user with 100 or less. *Last but not least, high-rep users usually know, follow and impose SO rules and best practices quite well where a Spark centric SE might not be as rule-focused. Even though rules can sometimes be annoying, overall they build quality questions so more users get involved. From: Sean Owen [mailto:so...@cloudera.com] Sent: 24 November 2016 10:53 To: assaf.mendelson; dev@spark.apache.org Subject: Re: Handling questions in the mailing lists Here's a view into the requirements, for example: http://area51.stackexchange.com/proposals/76571/emacs<https://urldefense.proofpoint.com/v2/url?u=http-3A__area51.stackexchange.com_proposals_76571_emacs=DgMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA=JbblqRhi6skf8IQckq_B0uUmi-vtEU4-eByD_-XzH_0=> You're right there is a lot of activity on SO, easily 30-40 questions per day. One thing I noticed about, for example, the Data Science SE is that most questions relevant to it were still posted on SO or Cross Validated. It struggles as an SE even though there is, out there, more than enough activity that _should_ be on the specific SE. There are more niche things that end up working as an SE, so I'm not dead set against it, though it would remain unofficial and my gut is that it might just split the conversation yet further. I'd leave it, however, to anyone active on SO already to decide that it's worth a dedicated SE and just do it. On Thu, Nov 24, 2016 at 10:45 AM assaf.mendelson <assaf.mendel...@rsa.com<mailto:assaf.mendel...@rsa.com>> wrote: I am not sure what is enough traffic. Some of the SE groups already existing do not have that much traffic. Specifically the user mailing list has ~50 emails per day. It wouldn’t be much of a stretch to extract 1-2 questions per day from that. In the regular stackoverflow the apache-spark had more than 50 new questions in the last 24 hours alone (http://stackoverflow.com/questions/tagged/apache-spark?sort=newest=50<https://urldefense.proofpoint.com/v2/url?u=http-3A__stackoverflow.com_questions_tagged_apache-2Dspark-3Fsort-3Dnewest-26pageSize-3D50=DgMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA=PIWKkzz2E50ALvSppI-egkjBJr0ZJO7MFrLw48XUIqk=>). I believe this should be enough traffic (and the traffic would rise once quality answers begin to appear). From: Sean Owen [via Apache Spark Developers List] [mailto:ml-node+<mailto:ml-node%2B>[hidden email]<https://urldefense.proofpoint.com/v2/url?u=http-3A___user_SendEmail.jtp-3Ftype-3Dnode-26node-3D20008-26i-3D0=DgMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA=CmvGVA6SmAfyMrgYe09vDeLguHlYysDT9MQjmpxqZsg=>] Sent: Thursday, November 24, 2016 12:32 PM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists I don't think there's nearly enough traffic to sustain a stand-alone SE. I helped mod the Data Science SE and it's still not technically critical mass after 2 years. It would just fracture the discussion to yet another place. On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson <[hidden email]<https://urldefense.proofpoint.com/v2/url?u=http-3A___user_SendEmail.jtp-3Ftype-3Dnode-26node-3D20007-26i-3D0=DgMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=MpViTGYJ6D4gvTaxzibcvkTcjzAglGjcAiOkSkJqHZA=t_Eyig5OkwFVjh1bJTau690DaZUMy3chrAYd8qfOcJ4=>> wrote: Sorry to reawaken this, but I just noticed it is possible to propose new topic specific sites (http://area51.stackexchange.com/faq<https://urldefense.proofpoint.com/v2/url?u=http-3A__area51.stackexchange.com_faq=DgMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=MpViTGYJ6D4gvTaxzibcvkTcj
Re: Handling questions in the mailing lists
Here's a view into the requirements, for example: http://area51.stackexchange.com/proposals/76571/emacs You're right there is a lot of activity on SO, easily 30-40 questions per day. One thing I noticed about, for example, the Data Science SE is that most questions relevant to it were still posted on SO or Cross Validated. It struggles as an SE even though there is, out there, more than enough activity that _should_ be on the specific SE. There are more niche things that end up working as an SE, so I'm not dead set against it, though it would remain unofficial and my gut is that it might just split the conversation yet further. I'd leave it, however, to anyone active on SO already to decide that it's worth a dedicated SE and just do it. On Thu, Nov 24, 2016 at 10:45 AM assaf.mendelson <assaf.mendel...@rsa.com> wrote: > I am not sure what is enough traffic. Some of the SE groups already > existing do not have that much traffic. > > Specifically the user mailing list has ~50 emails per day. It wouldn’t be > much of a stretch to extract 1-2 questions per day from that. In the > regular stackoverflow the apache-spark had more than 50 new questions in > the last 24 hours alone ( > http://stackoverflow.com/questions/tagged/apache-spark?sort=newest=50). > > > > > I believe this should be enough traffic (and the traffic would rise once > quality answers begin to appear). > > > > > > *From:* Sean Owen [via Apache Spark Developers List] [mailto:ml-node+[hidden > email] <http:///user/SendEmail.jtp?type=node=20008=0>] > *Sent:* Thursday, November 24, 2016 12:32 PM > > > *To:* Mendelson, Assaf > *Subject:* Re: Handling questions in the mailing lists > > > > I don't think there's nearly enough traffic to sustain a stand-alone SE. I > helped mod the Data Science SE and it's still not technically critical mass > after 2 years. It would just fracture the discussion to yet another place. > > On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson <[hidden email] > <http:///user/SendEmail.jtp?type=node=20007=0>> wrote: > > Sorry to reawaken this, but I just noticed it is possible to propose new > topic specific sites (http://area51.stackexchange.com/faq) for stack > overflow. So for example we might have a spark.stackexchange.com spark > specific site. > > The advantage of such a site are many. First of all it is spark specific. > Secondly the reputation of people would be on spark and not on general > questions and lastly (and most importantly in my opinion) it would have > spark based moderators (which are all spark moderator as opposed to general > technology). > > > > The process of creating such a site is not complicated. Basically someone > creates a proposal (I have no problem doing so). Then creating 5 example > questions (something we want on the site) and get 5 people need to ‘follow’ > it within 3 days. This creates a “definition” phase. The goal is to get at > least 40 questions that embody the goal of the site and have at least 10 > net votes and enough people follow it. When enough traction has been made > (enough questions and enough followers) then the site moves to commitment > phase. In this phase users “commit” to being on the site (basically this is > aimed to see the community of experts is big enough). Once all this happens > the site moves into beta. This means the site becomes active and it will > become a full site if it sees enough traction. > > > > I would suggest trying to set this up. > > > > Thanks, > > Assaf > > > > *If you reply to this email, your message will be added to the discussion > below:* > > > http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20007.html > > To start a new topic under Apache Spark Developers List, email [hidden > email] <http:///user/SendEmail.jtp?type=node=20008=1> > To unsubscribe from Apache Spark Developers List, click here. > NAML > <http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> > > -- > View this message in context: RE: Handling questions in the mailing lists > <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20008.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. >
RE: Handling questions in the mailing lists
I am not sure what is enough traffic. Some of the SE groups already existing do not have that much traffic. Specifically the user mailing list has ~50 emails per day. It wouldn’t be much of a stretch to extract 1-2 questions per day from that. In the regular stackoverflow the apache-spark had more than 50 new questions in the last 24 hours alone (http://stackoverflow.com/questions/tagged/apache-spark?sort=newest=50). I believe this should be enough traffic (and the traffic would rise once quality answers begin to appear). From: Sean Owen [via Apache Spark Developers List] [mailto:ml-node+s1001551n2000...@n3.nabble.com] Sent: Thursday, November 24, 2016 12:32 PM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists I don't think there's nearly enough traffic to sustain a stand-alone SE. I helped mod the Data Science SE and it's still not technically critical mass after 2 years. It would just fracture the discussion to yet another place. On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelson <[hidden email]> wrote: Sorry to reawaken this, but I just noticed it is possible to propose new topic specific sites (http://area51.stackexchange.com/faq) for stack overflow. So for example we might have a spark.stackexchange.com<http://spark.stackexchange.com> spark specific site. The advantage of such a site are many. First of all it is spark specific. Secondly the reputation of people would be on spark and not on general questions and lastly (and most importantly in my opinion) it would have spark based moderators (which are all spark moderator as opposed to general technology). The process of creating such a site is not complicated. Basically someone creates a proposal (I have no problem doing so). Then creating 5 example questions (something we want on the site) and get 5 people need to ‘follow’ it within 3 days. This creates a “definition” phase. The goal is to get at least 40 questions that embody the goal of the site and have at least 10 net votes and enough people follow it. When enough traction has been made (enough questions and enough followers) then the site moves to commitment phase. In this phase users “commit” to being on the site (basically this is aimed to see the community of experts is big enough). Once all this happens the site moves into beta. This means the site becomes active and it will become a full site if it sees enough traction. I would suggest trying to set this up. Thanks, Assaf If you reply to this email, your message will be added to the discussion below: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20007.html To start a new topic under Apache Spark Developers List, email ml-node+s1001551n1...@n3.nabble.com<mailto:ml-node+s1001551n1...@n3.nabble.com> To unsubscribe from Apache Spark Developers List, click here<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=unsubscribe_by_code=1=YXNzYWYubWVuZGVsc29uQHJzYS5jb218MXwtMTI4OTkxNTg1Mg==>. NAML<http://apache-spark-developers-list.1001551.n3.nabble.com/template/NamlServlet.jtp?macro=macro_viewer=instant_html%21nabble%3Aemail.naml=nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.naml.namespaces.BasicNamespace-nabble.view.web.template.NabbleNamespace-nabble.view.web.template.NodeNamespace=notify_subscribers%21nabble%3Aemail.naml-instant_emails%21nabble%3Aemail.naml-send_instant_email%21nabble%3Aemail.naml> -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690p20008.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.
Re: Handling questions in the mailing lists
I don't think there's nearly enough traffic to sustain a stand-alone SE. I helped mod the Data Science SE and it's still not technically critical mass after 2 years. It would just fracture the discussion to yet another place. On Thu, Nov 24, 2016 at 6:52 AM assaf.mendelsonwrote: > Sorry to reawaken this, but I just noticed it is possible to propose new > topic specific sites (http://area51.stackexchange.com/faq) for stack > overflow. So for example we might have a spark.stackexchange.com spark > specific site. > > The advantage of such a site are many. First of all it is spark specific. > Secondly the reputation of people would be on spark and not on general > questions and lastly (and most importantly in my opinion) it would have > spark based moderators (which are all spark moderator as opposed to general > technology). > > > > The process of creating such a site is not complicated. Basically someone > creates a proposal (I have no problem doing so). Then creating 5 example > questions (something we want on the site) and get 5 people need to ‘follow’ > it within 3 days. This creates a “definition” phase. The goal is to get at > least 40 questions that embody the goal of the site and have at least 10 > net votes and enough people follow it. When enough traction has been made > (enough questions and enough followers) then the site moves to commitment > phase. In this phase users “commit” to being on the site (basically this is > aimed to see the community of experts is big enough). Once all this happens > the site moves into beta. This means the site becomes active and it will > become a full site if it sees enough traction. > > > > I would suggest trying to set this up. > > > > Thanks, > > Assaf > > >
RE: Handling questions in the mailing lists
Sorry to reawaken this, but I just noticed it is possible to propose new topic specific sites (http://area51.stackexchange.com/faq) for stack overflow. So for example we might have a spark.stackexchange.com spark specific site. The advantage of such a site are many. First of all it is spark specific. Secondly the reputation of people would be on spark and not on general questions and lastly (and most importantly in my opinion) it would have spark based moderators (which are all spark moderator as opposed to general technology). The process of creating such a site is not complicated. Basically someone creates a proposal (I have no problem doing so). Then creating 5 example questions (something we want on the site) and get 5 people need to 'follow' it within 3 days. This creates a "definition" phase. The goal is to get at least 40 questions that embody the goal of the site and have at least 10 net votes and enough people follow it. When enough traction has been made (enough questions and enough followers) then the site moves to commitment phase. In this phase users "commit" to being on the site (basically this is aimed to see the community of experts is big enough). Once all this happens the site moves into beta. This means the site becomes active and it will become a full site if it sees enough traction. I would suggest trying to set this up. Thanks, Assaf From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19916...@n3.nabble.com] Sent: Wednesday, November 16, 2016 4:33 PM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Awesome stuff! Thanks Sean! :-) On Wed, Nov 16, 2016 at 05:57 Sean Owen <[hidden email]> wrote: I updated the wiki to point to the /community.html page. (We're going to migrate the wiki real soon now anyway) I updated the /community.html page per this thread too. PR: https://github.com/apache/spark-website/pull/16 On Tue, Nov 15, 2016 at 2:49 PM assaf.mendelson <[hidden email]> wrote: Should probably also update the helping others section in the how to contribute section (https://cwiki.apache.org/confluence/display/SPARK/ContributingtoSpark#ContributingtoSpark-ContributingbyHelpingOtherUsers<https://cwiki.apache.org/confluence/display/SPARK/Contributing+to&%2343;Spark%23ContributingtoSpark-ContributingbyHelpingOtherUsers>">https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers) Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email][hidden email]<http://user/SendEmail.jtp?type=node=19891=0>] Sent: Sunday, November 13, 2016 8:52 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Hey Reynold, Looks like we all of the proposed changes into Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>. Anything else we can do to update the Spark Community page / welcome email? Meanwhile, let's all start answering questions on SO, eh?! :) Denny On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <[hidden email]<http://user/SendEmail.jtp?type=node=19835=0>> wrote: That's a good question, looking at http://stackoverflow.com/tags/apache-spark/topusers shows a few contributors who have already been active on SO including some committers and PMC members with very high overall SO reputations for any administrative needs (as well as a number of other contributors besides just PMC/committers). On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <[hidden email]<http://user/SendEmail.jtp?type=node=19835=1>> wrote: I was just wondering, before we move on to SO. Do we have enough contributors with enough reputation do manage things in SO? We would need contributors with enough reputation to have relevant privilages. For example: creating tags (requires 1500 reputation), edit questions and answers (2000), create tag synonums (2500), approve tag wiki edits (5000), access to moderator tools (1, this is required to delete questions etc.), protect questions (15000). All of these are important if we plan to have SO as a main resource. I know I originally suggested SO, however, if we do not have contributors with the required privileges and the willingness to help manage everything then I am not sure this is a good fit. Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email]<http://user/SendEmail.jtp?type=node=19835=2>[hidden email]<http://user/SendEmail.jtp?type=node=19800=0>] Sent: Wednesday, November 09, 2016 9:54 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the
Re: Handling questions in the mailing lists
I updated the wiki to point to the /community.html page. (We're going to migrate the wiki real soon now anyway) I updated the /community.html page per this thread too. PR: https://github.com/apache/spark-website/pull/16 On Tue, Nov 15, 2016 at 2:49 PM assaf.mendelson <assaf.mendel...@rsa.com> wrote: Should probably also update the helping others section in the how to contribute section (https://cwiki.apache.org/confluence/display/SPARK/ContributingtoSpark#ContributingtoSpark-ContributingbyHelpingOtherUsers <https://cwiki.apache.org/confluence/display/SPARK/Contributing+to&%2343;Spark%23ContributingtoSpark-ContributingbyHelpingOtherUsers> "> https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers) Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden email] <http://user/SendEmail.jtp?type=node=19891=0>] Sent: Sunday, November 13, 2016 8:52 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Hey Reynold, Looks like we all of the proposed changes into Proposed Community Mailing Lists / StackOverflow Changes <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>. Anything else we can do to update the Spark Community page / welcome email? Meanwhile, let's all start answering questions on SO, eh?! :) Denny On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <[hidden email] <http://user/SendEmail.jtp?type=node=19835=0>> wrote: That's a good question, looking at http://stackoverflow.com/tags/apache-spark/topusers shows a few contributors who have already been active on SO including some committers and PMC members with very high overall SO reputations for any administrative needs (as well as a number of other contributors besides just PMC/committers). On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <[hidden email] <http://user/SendEmail.jtp?type=node=19835=1>> wrote: I was just wondering, before we move on to SO. Do we have enough contributors with enough reputation do manage things in SO? We would need contributors with enough reputation to have relevant privilages. For example: creating tags (requires 1500 reputation), edit questions and answers (2000), create tag synonums (2500), approve tag wiki edits (5000), access to moderator tools (1, this is required to delete questions etc.), protect questions (15000). All of these are important if we plan to have SO as a main resource. I know I originally suggested SO, however, if we do not have contributors with the required privileges and the willingness to help manage everything then I am not sure this is a good fit. Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email] <http://user/SendEmail.jtp?type=node=19835=2>[hidden email] <http://user/SendEmail.jtp?type=node=19800=0>] Sent: Wednesday, November 09, 2016 9:54 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventions around SO. Saying this, perhaps we could suggest tags that we place in the header of the question whether it be SO or the mailing lists that will help us sort through all of these questions faster just as you suggested. The Proposed Community Mailing Lists / StackOverflow Changes <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has been updated to include suggested tags. WDYT?
RE: Handling questions in the mailing lists
Should probably also update the helping others section in the how to contribute section (https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark#ContributingtoSpark-ContributingbyHelpingOtherUsers) Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19835...@n3.nabble.com] Sent: Sunday, November 13, 2016 8:52 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Hey Reynold, Looks like we all of the proposed changes into Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>. Anything else we can do to update the Spark Community page / welcome email? Meanwhile, let's all start answering questions on SO, eh?! :) Denny On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <[hidden email]> wrote: That's a good question, looking at http://stackoverflow.com/tags/apache-spark/topusers shows a few contributors who have already been active on SO including some committers and PMC members with very high overall SO reputations for any administrative needs (as well as a number of other contributors besides just PMC/committers). On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <[hidden email]> wrote: I was just wondering, before we move on to SO. Do we have enough contributors with enough reputation do manage things in SO? We would need contributors with enough reputation to have relevant privilages. For example: creating tags (requires 1500 reputation), edit questions and answers (2000), create tag synonums (2500), approve tag wiki edits (5000), access to moderator tools (1, this is required to delete questions etc.), protect questions (15000). All of these are important if we plan to have SO as a main resource. I know I originally suggested SO, however, if we do not have contributors with the required privileges and the willingness to help manage everything then I am not sure this is a good fit. Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email][hidden email]<http://user/SendEmail.jtp?type=node=19800=0>] Sent: Wednesday, November 09, 2016 9:54 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventions around SO. Saying this, perhaps we could suggest tags that we place in the header of the question whether it be SO or the mailing lists that will help us sort through all of these questions faster just as you suggested. The Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has been updated to include suggested tags. WDYT? On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]<http://user/SendEmail.jtp?type=node=19799=0>> wrote: I like the document and I think it is good but I still feel like we are missing an important part here. Look at SO today. There are: - 4658 unanswered questions under apache-spark tag. - 394 unanswered questions under spark-dataframe tag. - 639 unanswered questions under apache-spark-sql - 859 unanswered questions under pyspark Just moving people to ask there will not help. The whole issue is having people answer the questions. The problem is that many of these questions do not fit SO (but are already there so they are noise), are bad (i.e. unclear or hard to answer), orphaned etc. while some are simply harder than what people with some experience in spark can handle and require more expertise. The problem is that people with the relevant expertise are drowning in noise. This. Is true for the mailing list and this is true for SO. For this reason I believe that just moving people to SO will not solve anything. My original thought was that if we had different tags then different people could watch open questions on these tags and therefore have a much lower noise. I thought that we would have a low tier (current one) of people just not following the documentation (which would remain as noise), then a beginner tier where we could have people downvoting bad questions but in most cases the community can answer the questions because they are common, then a “medium” tier which would mean harder questions but that can still be answered by advanced users and lastly an “advanced” tier to which committers can actually subscribed to (and adding sub tags for subsystems would improve this even more). I was not aware of SO policy for meta tags (the burnination link is about removing tags completely so I am not sure
Re: Handling questions in the mailing lists
Hey Reynold, Looks like we all of the proposed changes into Proposed Community Mailing Lists / StackOverflow Changes <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p>. Anything else we can do to update the Spark Community page / welcome email? Meanwhile, let's all start answering questions on SO, eh?! :) Denny On Thu, Nov 10, 2016 at 1:54 PM Holden Karau <hol...@pigscanfly.ca> wrote: > That's a good question, looking at > http://stackoverflow.com/tags/apache-spark/topusers shows a few > contributors who have already been active on SO including some committers > and PMC members with very high overall SO reputations for any > administrative needs (as well as a number of other contributors besides > just PMC/committers). > > On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <assaf.mendel...@rsa.com> > wrote: > > I was just wondering, before we move on to SO. > > Do we have enough contributors with enough reputation do manage things in > SO? > > We would need contributors with enough reputation to have relevant > privilages. > > For example: creating tags (requires 1500 reputation), edit questions and > answers (2000), create tag synonums (2500), approve tag wiki edits (5000), > access to moderator tools (1, this is required to delete questions > etc.), protect questions (15000). > > All of these are important if we plan to have SO as a main resource. > > I know I originally suggested SO, however, if we do not have contributors > with the required privileges and the willingness to help manage everything > then I am not sure this is a good fit. > > Assaf. > > > > *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden > email] <http:///user/SendEmail.jtp?type=node=19800=0>] > *Sent:* Wednesday, November 09, 2016 9:54 AM > *To:* Mendelson, Assaf > *Subject:* Re: Handling questions in the mailing lists > > > > Agreed that by simply just moving the questions to SO will not solve > anything but I think the call out about the meta-tags is that we need to > abide by SO rules and if we were to just jump in and start creating > meta-tags, we would be violating at minimum the spirit and at maximum the > actual conventions around SO. > > > > Saying this, perhaps we could suggest tags that we place in the header of > the question whether it be SO or the mailing lists that will help us sort > through all of these questions faster just as you suggested. The Proposed > Community Mailing Lists / StackOverflow Changes > <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> > has > been updated to include suggested tags. WDYT? > > > > On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email] > <http:///user/SendEmail.jtp?type=node=19799=0>> wrote: > > I like the document and I think it is good but I still feel like we are > missing an important part here. > > > > Look at SO today. There are: > > - 4658 unanswered questions under apache-spark tag. > > - 394 unanswered questions under spark-dataframe tag. > > - 639 unanswered questions under apache-spark-sql > > - 859 unanswered questions under pyspark > > > > Just moving people to ask there will not help. The whole issue is having > people answer the questions. > > > > The problem is that many of these questions do not fit SO (but are already > there so they are noise), are bad (i.e. unclear or hard to answer), > orphaned etc. while some are simply harder than what people with some > experience in spark can handle and require more expertise. > > The problem is that people with the relevant expertise are drowning in > noise. This. Is true for the mailing list and this is true for SO. > > > > For this reason I believe that just moving people to SO will not solve > anything. > > > > My original thought was that if we had different tags then different > people could watch open questions on these tags and therefore have a much > lower noise. I thought that we would have a low tier (current one) of > people just not following the documentation (which would remain as noise), > then a beginner tier where we could have people downvoting bad questions > but in most cases the community can answer the questions because they are > common, then a “medium” tier which would mean harder questions but that can > still be answered by advanced users and lastly an “advanced” tier to which > committers can actually subscribed to (and adding sub tags for subsystems > would improve this even more). > > > > I was not aware of SO policy for
Re: Handling questions in the mailing lists
That's a good question, looking at http://stackoverflow.com/tags/apache-spark/topusers shows a few contributors who have already been active on SO including some committers and PMC members with very high overall SO reputations for any administrative needs (as well as a number of other contributors besides just PMC/committers). On Wed, Nov 9, 2016 at 2:18 AM, assaf.mendelson <assaf.mendel...@rsa.com> wrote: > I was just wondering, before we move on to SO. > > Do we have enough contributors with enough reputation do manage things in > SO? > > We would need contributors with enough reputation to have relevant > privilages. > > For example: creating tags (requires 1500 reputation), edit questions and > answers (2000), create tag synonums (2500), approve tag wiki edits (5000), > access to moderator tools (1, this is required to delete questions > etc.), protect questions (15000). > > All of these are important if we plan to have SO as a main resource. > > I know I originally suggested SO, however, if we do not have contributors > with the required privileges and the willingness to help manage everything > then I am not sure this is a good fit. > > Assaf. > > > > *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden > email] <http:///user/SendEmail.jtp?type=node=19800=0>] > *Sent:* Wednesday, November 09, 2016 9:54 AM > *To:* Mendelson, Assaf > *Subject:* Re: Handling questions in the mailing lists > > > > Agreed that by simply just moving the questions to SO will not solve > anything but I think the call out about the meta-tags is that we need to > abide by SO rules and if we were to just jump in and start creating > meta-tags, we would be violating at minimum the spirit and at maximum the > actual conventions around SO. > > > > Saying this, perhaps we could suggest tags that we place in the header of > the question whether it be SO or the mailing lists that will help us sort > through all of these questions faster just as you suggested. The Proposed > Community Mailing Lists / StackOverflow Changes > <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> > has > been updated to include suggested tags. WDYT? > > > > On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email] > <http:///user/SendEmail.jtp?type=node=19799=0>> wrote: > > I like the document and I think it is good but I still feel like we are > missing an important part here. > > > > Look at SO today. There are: > > - 4658 unanswered questions under apache-spark tag. > > - 394 unanswered questions under spark-dataframe tag. > > - 639 unanswered questions under apache-spark-sql > > - 859 unanswered questions under pyspark > > > > Just moving people to ask there will not help. The whole issue is having > people answer the questions. > > > > The problem is that many of these questions do not fit SO (but are already > there so they are noise), are bad (i.e. unclear or hard to answer), > orphaned etc. while some are simply harder than what people with some > experience in spark can handle and require more expertise. > > The problem is that people with the relevant expertise are drowning in > noise. This. Is true for the mailing list and this is true for SO. > > > > For this reason I believe that just moving people to SO will not solve > anything. > > > > My original thought was that if we had different tags then different > people could watch open questions on these tags and therefore have a much > lower noise. I thought that we would have a low tier (current one) of > people just not following the documentation (which would remain as noise), > then a beginner tier where we could have people downvoting bad questions > but in most cases the community can answer the questions because they are > common, then a “medium” tier which would mean harder questions but that can > still be answered by advanced users and lastly an “advanced” tier to which > committers can actually subscribed to (and adding sub tags for subsystems > would improve this even more). > > > > I was not aware of SO policy for meta tags (the burnination link is about > removing tags completely so I am not sure how it applies, I believe this > link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more > relevant). > > There was actually a discussion along the lines in SO ( > http://meta.stackoverflow.com/questions/253338/filtering-questions-by- > difficulty-level). > > > > The fact that SO did not solve this issue, does not mean we shouldn’t > either. > > > > The way I s
Re: Handling questions in the mailing lists
If you take a look at the statistics (https://data.stackexchange.com/stackoverflow/query/575406) you'll see that majority of the unanswered questions: * have seen no activity in the last year OR * don't have positive score OR * have been asked by inactive or new users. This is usually a good indicator that question is poor quality and / or abandoned and for different reasons hasn't been picked by the removal process (https://stackoverflow.com/help/roomba). This is not unusual for Stack Overflow and with a little bit of organized effort could be cleaned in a few weeks. Arguably, for a technology with a large number of moving parts, Spark has pretty decent /answer rate/ and definitely better than many comparable projects. Regarding tagging. Putting community rules aside clean questions which can be answered with relatively low effort are usually resolved in a few days. What is left is either to time consuming or complex or just not not worth the time. If you have a lot of time the former ones can be easily selected using predefined filters and the rest usually qualifies for closing. Still, I believe there is a really important missing point here. All of that requires a lot of effort and it is slightly unrealistic to expect that the number of people willing and having time to contribute will suddenly grow. So the focus should be on having a knowledge base which can reduce number of questions to be answered. SO has good visibility, large number of existing answers, and very good tools. On 11/09/2016 08:02 AM, assaf.mendelson wrote: > > I like the document and I think it is good but I still feel like we > are missing an important part here. > > > > Look at SO today. There are: > > - 4658 unanswered questions under apache-spark tag. > > - 394 unanswered questions under spark-dataframe tag. > > - 639 unanswered questions under apache-spark-sql > > - 859 unanswered questions under pyspark > > > > Just moving people to ask there will not help. The whole issue is > having people answer the questions. > > > > The problem is that many of these questions do not fit SO (but are > already there so they are noise), are bad (i.e. unclear or hard to > answer), orphaned etc. while some are simply harder than what people > with some experience in spark can handle and require more expertise. > > The problem is that people with the relevant expertise are drowning in > noise. This. Is true for the mailing list and this is true for SO. > > > > For this reason I believe that just moving people to SO will not solve > anything. > > > > My original thought was that if we had different tags then different > people could watch open questions on these tags and therefore have a > much lower noise. I thought that we would have a low tier (current > one) of people just not following the documentation (which would > remain as noise), then a beginner tier where we could have people > downvoting bad questions but in most cases the community can answer > the questions because they are common, then a “medium” tier which > would mean harder questions but that can still be answered by advanced > users and lastly an “advanced” tier to which committers can actually > subscribed to (and adding sub tags for subsystems would improve this > even more). > > > > I was not aware of SO policy for meta tags (the burnination link is > about removing tags completely so I am not sure how it applies, I > believe this link > https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more > relevant). > > There was actually a discussion along the lines in SO > (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level). > > > > The fact that SO did not solve this issue, does not mean we shouldn’t > either. > > > > The way I see it, some tags can easily be used even with the meta tags > limitation. For example, using spark-internal-development tag can be > used to ask questions for development of spark. There are already tags > for some spark subsystems (there is a apachae-spark-sql tag, a pyspark > tag, a spark-streaming tag etc.). The main issue I see and the one we > can’t seem to get around is dividing between simple questions that the > community should answer and hard questions which only advanced users > can answer. > > > > Maybe SO isn’t the correct platform for that but even within it we can > try to find a non meta name for spark beginner questions vs. spark > advanced questions. > > Assaf. > > > > > > *From:*Denny Lee [via Apache Spark Developers List] > [mailto:ml-node+[hidden email] > ] > *Sent:* Tuesday, November 08, 2016 7:53 AM > *To:* Mendelson, Assaf > *Subject:
Re: Handling questions in the mailing lists
nowib.com> wrote: > > I fell Assaf point is quite relevant if we want to move this project > forward from the Spark user perspective (as I do). In fact, we're still > using 20th century tools (mailing lists) with some add-ons (like Stack > Overflow). > > As usually, Sean and Cody's contributions are very to the point. > I fell it is indeed a matter of of culture (hard to enforce) and tools > (much easier). Isn't it? > > On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org> wrote: > > So concrete things people could do > > - users could tag subject lines appropriately to the component they're > asking about > > - contributors could monitor user@ for tags relating to components > they've worked on. > I'd be surprised if my miss rate for any mailing list questions > well-labeled as Kafka was higher than 5% > > - committers could be more aggressive about soliciting and merging PRs > to improve documentation. > It's a lot easier to answer even poorly-asked questions with a link to > relevant docs. > > On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote: > > There's already reviews@ and issues@. dev@ is for project development > itself > > and I think is OK. You're suggesting splitting up user@ and I sympathize > > with the motivation. Experience tells me that we'll have a beginner@ > that's > > then totally ignored, and people will quickly learn to post to advanced@ > to > > get attention, and we'll be back where we started. Putting it in JIRA > > doesn't help. I don't think this a problem that is merely down to lack of > > process. It actually requires cultivating a culture change on the > community > > list. > > > > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < > assaf.mendel...@rsa.com> > > wrote: > >> > >> What I am suggesting is basically to fix that. > >> > >> For example, we might say that mailing list A is only for voting, > mailing > >> list B is only for PR and have something like stack overflow for > developer > >> questions (I would even go as far as to have beginner, intermediate and > >> advanced mailing list for users and beginner/advanced for dev). > >> > >> > >> > >> This can easily be done using stack overflow tags, however, that would > >> probably be harder to manage. > >> > >> Maybe using special jira tags and manage it in jira? > >> > >> > >> > >> Anyway as I said, the main issue is not user questions (except maybe > >> advanced ones) but more for dev questions. It is so easy to get lost in > the > >> chatter that it makes it very hard for people to learn spark internals… > >> > >> Assaf. > >> > >> > >> > >> From: Sean Owen [mailto:so...@cloudera.com] > >> Sent: Wednesday, November 02, 2016 2:07 PM > >> To: Mendelson, Assaf; dev@spark.apache.org > >> Subject: Re: Handling questions in the mailing lists > >> > >> > >> > >> I think that unfortunately mailing lists don't scale well. This one has > >> thousands of subscribers with different interests and levels of > experience. > >> For any given person, most messages will be irrelevant. I also find > that a > >> lot of questions on user@ are not well-asked, aren't an SSCCE > >> (http://sscce.org/), not something most people are going to bother > replying > >> to even if they could answer. I almost entirely ignore user@ because > there > >> are higher-priority channels like PRs to deal with, that already have > >> hundreds of messages per day. This is why little of it gets an answer > -- too > >> noisy. > >> > >> > >> > >> We have to have official mailing lists, in any event, to have some > >> official channel for things like votes and announcements. It's not > wrong to > >> ask questions on user@ of course, but a lot of the questions I see > could > >> have been answered with research of existing docs or looking at the > code. I > >> think that given the scale of the list, it's not wrong to assert that > this > >> is sort of a prerequisite for asking thousands of people to answer one's > >> question. But we can't enforce that. > >> > >> > >> > >> The situation will get better to the extent people ask better questions, > >> help other people ask better questions, and answer good questions. I'd > >> encourage anyone feeling this way to try to help along those dimensions. > >> >
Re: Handling questions in the mailing lists
7eLnzRidSkrsKKG0xKw=TKTxY_sYw@ >>>>>mail.gmail.com%3E >>>>> >>>>> <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E> >>>>> >>>>> (It’s ironic that it’s difficult to follow the past discussion on why >>>>> we can’t change our official communication tools due to those very tools…) >>>>> >>>>> Nick >>>>> >>>>> >>>>> On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida < >>>>> ricardo.alme...@actnowib.com> wrote: >>>>> >>>>>> I fell Assaf point is quite relevant if we want to move this project >>>>>> forward from the Spark user perspective (as I do). In fact, we're >>>>>> still using 20th century tools (mailing lists) with some add-ons (like >>>>>> Stack Overflow). >>>>>> >>>>>> As usually, Sean and Cody's contributions are very to the point. >>>>>> I fell it is indeed a matter of of culture (hard to enforce) and tools >>>>>> (much easier). Isn't it? >>>>>> >>>>>> On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org> >>>>>> wrote: >>>>>> >>>>>>> So concrete things people could do >>>>>>> >>>>>>> - users could tag subject lines appropriately to the component >>>>>>> they're >>>>>>> asking about >>>>>>> >>>>>>> - contributors could monitor user@ for tags relating to components >>>>>>> they've worked on. >>>>>>> I'd be surprised if my miss rate for any mailing list questions >>>>>>> well-labeled as Kafka was higher than 5% >>>>>>> >>>>>>> - committers could be more aggressive about soliciting and merging >>>>>>> PRs >>>>>>> to improve documentation. >>>>>>> It's a lot easier to answer even poorly-asked questions with a link >>>>>>> to >>>>>>> relevant docs. >>>>>>> >>>>>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> >>>>>>> wrote: >>>>>>> > There's already reviews@ and issues@. dev@ is for project >>>>>>> development itself >>>>>>> > and I think is OK. You're suggesting splitting up user@ and I >>>>>>> sympathize >>>>>>> > with the motivation. Experience tells me that we'll have a >>>>>>> beginner@ that's >>>>>>> > then totally ignored, and people will quickly learn to post to >>>>>>> advanced@ to >>>>>>> > get attention, and we'll be back where we started. Putting it in >>>>>>> JIRA >>>>>>> > doesn't help. I don't think this a problem that is merely down to >>>>>>> lack of >>>>>>> > process. It actually requires cultivating a culture change on the >>>>>>> community >>>>>>> > list. >>>>>>> > >>>>>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < >>>>>>> assaf.mendel...@rsa.com> >>>>>>> > wrote: >>>>>>> >> >>>>>>> >> What I am suggesting is basically to fix that. >>>>>>> >> >>>>>>> >> For example, we might say that mailing list A is only for voting, >>>>>>> mailing >>>>>>> >> list B is only for PR and have something like stack overflow for >>>>>>> developer >>>>>>> >> questions (I would even go as far as to have beginner, >>>>>>> intermediate and >>>>>>> >> advanced mailing list for users and beginner/advanced for dev). >>>>>>> >> >>>>>>> >> >>>>>>> >> >>>>>>> >> This can easily be done using stack overflow tags, however, that >>>>>>> would >>>>>>> >> probably be harder to manage. >>>>>>> >> >>>>>>> >> Maybe using special jira tags and manage it in jira? >>>>>>>
RE: Handling questions in the mailing lists
I was just wondering, before we move on to SO. Do we have enough contributors with enough reputation do manage things in SO? We would need contributors with enough reputation to have relevant privilages. For example: creating tags (requires 1500 reputation), edit questions and answers (2000), create tag synonums (2500), approve tag wiki edits (5000), access to moderator tools (1, this is required to delete questions etc.), protect questions (15000). All of these are important if we plan to have SO as a main resource. I know I originally suggested SO, however, if we do not have contributors with the required privileges and the willingness to help manage everything then I am not sure this is a good fit. Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19799...@n3.nabble.com] Sent: Wednesday, November 09, 2016 9:54 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventions around SO. Saying this, perhaps we could suggest tags that we place in the header of the question whether it be SO or the mailing lists that will help us sort through all of these questions faster just as you suggested. The Proposed Community Mailing Lists / StackOverflow Changes<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has been updated to include suggested tags. WDYT? On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <[hidden email]> wrote: I like the document and I think it is good but I still feel like we are missing an important part here. Look at SO today. There are: - 4658 unanswered questions under apache-spark tag. - 394 unanswered questions under spark-dataframe tag. - 639 unanswered questions under apache-spark-sql - 859 unanswered questions under pyspark Just moving people to ask there will not help. The whole issue is having people answer the questions. The problem is that many of these questions do not fit SO (but are already there so they are noise), are bad (i.e. unclear or hard to answer), orphaned etc. while some are simply harder than what people with some experience in spark can handle and require more expertise. The problem is that people with the relevant expertise are drowning in noise. This. Is true for the mailing list and this is true for SO. For this reason I believe that just moving people to SO will not solve anything. My original thought was that if we had different tags then different people could watch open questions on these tags and therefore have a much lower noise. I thought that we would have a low tier (current one) of people just not following the documentation (which would remain as noise), then a beginner tier where we could have people downvoting bad questions but in most cases the community can answer the questions because they are common, then a “medium” tier which would mean harder questions but that can still be answered by advanced users and lastly an “advanced” tier to which committers can actually subscribed to (and adding sub tags for subsystems would improve this even more). I was not aware of SO policy for meta tags (the burnination link is about removing tags completely so I am not sure how it applies, I believe this link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more relevant). There was actually a discussion along the lines in SO (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level). The fact that SO did not solve this issue, does not mean we shouldn’t either. The way I see it, some tags can easily be used even with the meta tags limitation. For example, using spark-internal-development tag can be used to ask questions for development of spark. There are already tags for some spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a spark-streaming tag etc.). The main issue I see and the one we can’t seem to get around is dividing between simple questions that the community should answer and hard questions which only advanced users can answer. Maybe SO isn’t the correct platform for that but even within it we can try to find a non meta name for spark beginner questions vs. spark advanced questions. Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:[hidden email][hidden email]<http://user/SendEmail.jtp?type=node=19798=0>] Sent: Tuesday, November 08, 2016 7:53 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists To help track and get the verbiage for the Spark community page and welcome email jump started, here's a working document for us to
Re: Handling questions in the mailing lists
Agreed that by simply just moving the questions to SO will not solve anything but I think the call out about the meta-tags is that we need to abide by SO rules and if we were to just jump in and start creating meta-tags, we would be violating at minimum the spirit and at maximum the actual conventions around SO. Saying this, perhaps we could suggest tags that we place in the header of the question whether it be SO or the mailing lists that will help us sort through all of these questions faster just as you suggested. The Proposed Community Mailing Lists / StackOverflow Changes <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#heading=h.xshc1bv4sn3p> has been updated to include suggested tags. WDYT? On Tue, Nov 8, 2016 at 11:02 PM assaf.mendelson <assaf.mendel...@rsa.com> wrote: > I like the document and I think it is good but I still feel like we are > missing an important part here. > > > > Look at SO today. There are: > > - 4658 unanswered questions under apache-spark tag. > > - 394 unanswered questions under spark-dataframe tag. > > - 639 unanswered questions under apache-spark-sql > > - 859 unanswered questions under pyspark > > > > Just moving people to ask there will not help. The whole issue is having > people answer the questions. > > > > The problem is that many of these questions do not fit SO (but are already > there so they are noise), are bad (i.e. unclear or hard to answer), > orphaned etc. while some are simply harder than what people with some > experience in spark can handle and require more expertise. > > The problem is that people with the relevant expertise are drowning in > noise. This. Is true for the mailing list and this is true for SO. > > > > For this reason I believe that just moving people to SO will not solve > anything. > > > > My original thought was that if we had different tags then different > people could watch open questions on these tags and therefore have a much > lower noise. I thought that we would have a low tier (current one) of > people just not following the documentation (which would remain as noise), > then a beginner tier where we could have people downvoting bad questions > but in most cases the community can answer the questions because they are > common, then a “medium” tier which would mean harder questions but that can > still be answered by advanced users and lastly an “advanced” tier to which > committers can actually subscribed to (and adding sub tags for subsystems > would improve this even more). > > > > I was not aware of SO policy for meta tags (the burnination link is about > removing tags completely so I am not sure how it applies, I believe this > link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more > relevant). > > There was actually a discussion along the lines in SO ( > http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level > ). > > > > The fact that SO did not solve this issue, does not mean we shouldn’t > either. > > > > The way I see it, some tags can easily be used even with the meta tags > limitation. For example, using spark-internal-development tag can be used > to ask questions for development of spark. There are already tags for some > spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a > spark-streaming tag etc.). The main issue I see and the one we can’t seem > to get around is dividing between simple questions that the community > should answer and hard questions which only advanced users can answer. > > > > Maybe SO isn’t the correct platform for that but even within it we can try > to find a non meta name for spark beginner questions vs. spark advanced > questions. > > Assaf. > > > > > > *From:* Denny Lee [via Apache Spark Developers List] [mailto:ml-node+[hidden > email] <http:///user/SendEmail.jtp?type=node=19798=0>] > *Sent:* Tuesday, November 08, 2016 7:53 AM > *To:* Mendelson, Assaf > > > *Subject:* Re: Handling questions in the mailing lists > > > > To help track and get the verbiage for the Spark community page and > welcome email jump started, here's a working document for us to work with: > https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit# > <https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit> > > > > Hope this will help us collaborate on this stuff a little faster. > > On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email] > <http:///user/SendEmail.jtp?type=node=19770=0>> wrote: > > Just a couple of random thoughts regarding Stack Overflo
RE: Handling questions in the mailing lists
I like the document and I think it is good but I still feel like we are missing an important part here. Look at SO today. There are: - 4658 unanswered questions under apache-spark tag. - 394 unanswered questions under spark-dataframe tag. - 639 unanswered questions under apache-spark-sql - 859 unanswered questions under pyspark Just moving people to ask there will not help. The whole issue is having people answer the questions. The problem is that many of these questions do not fit SO (but are already there so they are noise), are bad (i.e. unclear or hard to answer), orphaned etc. while some are simply harder than what people with some experience in spark can handle and require more expertise. The problem is that people with the relevant expertise are drowning in noise. This. Is true for the mailing list and this is true for SO. For this reason I believe that just moving people to SO will not solve anything. My original thought was that if we had different tags then different people could watch open questions on these tags and therefore have a much lower noise. I thought that we would have a low tier (current one) of people just not following the documentation (which would remain as noise), then a beginner tier where we could have people downvoting bad questions but in most cases the community can answer the questions because they are common, then a “medium” tier which would mean harder questions but that can still be answered by advanced users and lastly an “advanced” tier to which committers can actually subscribed to (and adding sub tags for subsystems would improve this even more). I was not aware of SO policy for meta tags (the burnination link is about removing tags completely so I am not sure how it applies, I believe this link https://stackoverflow.blog/2010/08/the-death-of-meta-tags/ is more relevant). There was actually a discussion along the lines in SO (http://meta.stackoverflow.com/questions/253338/filtering-questions-by-difficulty-level). The fact that SO did not solve this issue, does not mean we shouldn’t either. The way I see it, some tags can easily be used even with the meta tags limitation. For example, using spark-internal-development tag can be used to ask questions for development of spark. There are already tags for some spark subsystems (there is a apachae-spark-sql tag, a pyspark tag, a spark-streaming tag etc.). The main issue I see and the one we can’t seem to get around is dividing between simple questions that the community should answer and hard questions which only advanced users can answer. Maybe SO isn’t the correct platform for that but even within it we can try to find a non meta name for spark beginner questions vs. spark advanced questions. Assaf. From: Denny Lee [via Apache Spark Developers List] [mailto:ml-node+s1001551n19770...@n3.nabble.com] Sent: Tuesday, November 08, 2016 7:53 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists To help track and get the verbiage for the Spark community page and welcome email jump started, here's a working document for us to work with: https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit#<https://docs.google.com/document/d/1N0pKatcM15cqBPqFWCqIy6jdgNzIoacZlYDCjufBh2s/edit> Hope this will help us collaborate on this stuff a little faster. On Mon, Nov 7, 2016 at 2:25 PM Maciej Szymkiewicz <[hidden email]> wrote: Just a couple of random thoughts regarding Stack Overflow... * If we are thinking about shifting focus towards SO all attempts of micromanaging should be discarded right in the beginning. Especially things like meta tags, which are discouraged and "burninated" (https://meta.stackoverflow.com/tags/burninate-request/info) , or thread bumping. Depending on a context these won't be manageable, go against community guidelines or simply obsolete. * Lack of expertise is unlikely an issue. Even now there is a number of advanced Spark users on SO. Of course the more the merrier. Things that can be easily improved: * Identifying, improving and promoting canonical questions and answers. It means closing duplicate, suggesting edits to improve existing answers, providing alternative solutions. This can be also used to identify gaps in the documentation. * Providing a set of clear posting guidelines to reduce effort required to identify the problem (think about http://stackoverflow.com/q/5963269 a.k.a How to make a great R reproducible example?) * Helping users decide if question is a good fit for SO (see below). API questions are great fit, debugging problems like "my cluster is slow" are not. * Actively cleaning (closing, deleting) off-topic and low quality questions. The less junk to sieve through the better chance of good questions being answered. * Repurposing and actively moderating SO docs (https://stackoverflo
Re: Handling questions in the mailing lists
t is so easy to get lost in the >> chatter that it makes it very hard for people to learn spark internals… >> >> Assaf. >> >> >> >> From: Sean Owen [mailto:so...@cloudera.com] >> Sent: Wednesday, November 02, 2016 2:07 PM >> To: Mendelson, Assaf; dev@spark.apache.org >> Subject: Re: Handling questions in the mailing lists >> >> >> >> I think that unfortunately mailing lists don't scale well. This one has >> thousands of subscribers with different interests and levels of experience. >> For any given person, most messages will be irrelevant. I also find that a >> lot of questions on user@ are not well-asked, aren't an SSCCE >> (http://sscce.org/), not something most people are going to bother replying >> to even if they could answer. I almost entirely ignore user@ because there >> are higher-priority channels like PRs to deal with, that already have >> hundreds of messages per day. This is why little of it gets an answer -- too >> noisy. >> >> >> >> We have to have official mailing lists, in any event, to have some >> official channel for things like votes and announcements. It's not wrong to >> ask questions on user@ of course, but a lot of the questions I see could >> have been answered with research of existing docs or looking at the code. I >> think that given the scale of the list, it's not wrong to assert that this >> is sort of a prerequisite for asking thousands of people to answer one's >> question. But we can't enforce that. >> >> >> >> The situation will get better to the extent people ask better questions, >> help other people ask better questions, and answer good questions. I'd >> encourage anyone feeling this way to try to help along those dimensions. >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <assaf.mendel...@rsa.com> >> wrote: >> >> Hi, >> >> I know this is a little off topic but I wanted to raise an issue about >> handling questions in the mailing list (this is true both for the user >> mailing list and the dev but since there are other options such as stack >> overflow for user questions, this is more problematic in dev). >> >> Let’s say I ask a question (as I recently did). Unfortunately this was >> during spark summit in Europe so probably people were busy. In any case no >> one answered. >> >> The problem is, that if no one answers very soon, the question will almost >> certainly remain unanswered because new messages will simply drown it. >> >> >> >> This is a common issue not just for questions but for any comment or idea >> which is not immediately picked up. >> >> >> >> I believe we should have a method of handling this. >> >> Generally, I would say these types of things belong in stack overflow, >> after all, the way it is built is perfect for this. More seasoned spark >> contributors and committers can periodically check out unanswered questions >> and answer them. >> >> The problem is that stack overflow (as well as other targets such as the >> databricks forums) tend to have a more user based orientation. This means >> that any spark internal question will almost certainly remain unanswered. >> >> >> >> I was wondering if we could come up with a solution for this. >> >> >> >> Assaf. >> >> >> >> >> >> >> >> View this message in context: Handling questions in the mailing lists >> Sent from the Apache Spark Developers List mailing list archive at >> Nabble.com<http://nabble.com>. - To unsubscribe e-mail: dev-unsubscr...@spark.apache.org -- Maciej Szymkiewicz
Re: Handling questions in the mailing lists
d on the ASF member list (which is private so there is no > public archive). > > It is not against any ASF rule to recommend StackOverflow as a place for > users to ask questions. I don't think we can or should delete the existing > user@spark list either, but we can certainly make SO more visible than it > is. > > > > On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <r...@databricks.com> wrote: > > Actually after talking with more ASF members, I believe the only policy is > that development decisions have to be made and announced on ASF properties > (dev list or jira), but user questions don't have to. > > I'm going to double check this. If it is true, I would actually recommend > us moving entirely over the Q part of the user list to stackoverflow, or > at least make that the recommended way rather than the existing user list > which is not very scalable. > > > On Wednesday, November 2, 2016, Nicholas Chammas < > nicholas.cham...@gmail.com> wrote: > > We’ve discussed several times upgrading our communication tools, as far > back as 2014 and maybe even before that too. The bottom line is that we > can’t due to ASF rules requiring the use of ASF-managed mailing lists. > > For some history, see this discussion: > >- > > https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E >- > > https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E > > (It’s ironic that it’s difficult to follow the past discussion on why we > can’t change our official communication tools due to those very tools…) > > Nick > > > On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida < > ricardo.alme...@actnowib.com> wrote: > > I fell Assaf point is quite relevant if we want to move this project > forward from the Spark user perspective (as I do). In fact, we're still > using 20th century tools (mailing lists) with some add-ons (like Stack > Overflow). > > As usually, Sean and Cody's contributions are very to the point. > I fell it is indeed a matter of of culture (hard to enforce) and tools > (much easier). Isn't it? > > On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org> wrote: > > So concrete things people could do > > - users could tag subject lines appropriately to the component they're > asking about > > - contributors could monitor user@ for tags relating to components > they've worked on. > I'd be surprised if my miss rate for any mailing list questions > well-labeled as Kafka was higher than 5% > > - committers could be more aggressive about soliciting and merging PRs > to improve documentation. > It's a lot easier to answer even poorly-asked questions with a link to > relevant docs. > > On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote: > > There's already reviews@ and issues@. dev@ is for project development > itself > > and I think is OK. You're suggesting splitting up user@ and I sympathize > > with the motivation. Experience tells me that we'll have a beginner@ > that's > > then totally ignored, and people will quickly learn to post to advanced@ > to > > get attention, and we'll be back where we started. Putting it in JIRA > > doesn't help. I don't think this a problem that is merely down to lack of > > process. It actually requires cultivating a culture change on the > community > > list. > > > > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < > assaf.mendel...@rsa.com> > > wrote: > >> > >> What I am suggesting is basically to fix that. > >> > >> For example, we might say that mailing list A is only for voting, > mailing > >> list B is only for PR and have something like stack overflow for > developer > >> questions (I would even go as far as to have beginner, intermediate and > >> advanced mailing list for users and beginner/advanced for dev). > >> > >> > >> > >> This can easily be done using stack overflow tags, however, that would > >> probably be harder to manage. > >> > >> Maybe using special jira tags and manage it in jira? > >> > >> > >> > >> Anyway as I said, the main issue is not user questions (except maybe > >> advanced ones) but more for dev questions. It is so easy to get lost in > the > >> chatter that it makes it very hard for people to learn spark internals… > >> > >> Assaf. > >> > >> > >> > >> From: Sean Owen [mailto:so...@cloudera.
Re: Handling questions in the mailing lists
<so...@cloudera.com> wrote: >>> > There's already reviews@ and issues@. dev@ >>> is for project development itself >>> > and I think is OK. You're suggesting >>> splitting up user@ and I sympathize >>> > with the motivation. Experience tells me >>> that we'll have a beginner@ that's >>> > then totally ignored, and people will >>> quickly learn to post to advanced@ to >>> > get attention, and we'll be back where we >>> started. Putting it in JIRA >>> > doesn't help. I don't think this a problem >>> that is merely down to lack of >>> > process. It actually requires cultivating >>> a culture change on the community >>> > list. >>> > >>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, >>> Assaf <assaf.mendel...@rsa.com> >>> > wrote: >>> >> >>> >> What I am suggesting is basically to fix >>> that. >>> >> >>> >> For example, we might say that mailing >>> list A is only for voting, mailing >>> >> list B is only for PR and have something >>> like stack overflow for developer >>> >> questions (I would even go as far as to >>> have beginner, intermediate and >>> >> advanced mailing list for users and >>> beginner/advanced for dev). >>> >> >>> >> >>> >> >>> >> This can easily be done using stack >>> overflow tags, however, that would >>> >> probably be harder to manage. >>> >> >>> >> Maybe using special jira tags and manage >>> it in jira? >>> >> >>> >> >>> >> >>> >> Anyway as I said, the main issue is not >>> user questions (except maybe >>> >> advanced ones) but more for dev >>> questions. It is so easy to get lost in the >>> >> chatter that it makes it very hard for >>> people to learn spark internals… >>> >> >>> >> Assaf. >>> >> >>> >> >>> >> >>> >> From: Sean Owen [mailto:so...@cloudera.com] >>> >> Sent: Wednesday, November 02, 2016 2:07 PM >>> >> To: Mendelson, Assaf; dev@spark.apache.org >>> >> Subject: Re: Handling questions in the >>> mailing lists >>> >> >>> >> >>> >> >>> >> I think that unfortunately mailing lists >>> don't scale well. This one has >>> >> thousands of subscribers with different >>> interests and levels of experience. >>> >> For any given person, most messages will >>> be irrelevant. I also find that a >>> >> lot of questions on user@ are not >>> well-asked, aren't an SSCCE >>>
Re: Handling questions in the mailing lists
Thanks Reynold for reviewing the ASF rules. Albeit the potential issues mentioned, I feel using StackOverflow would be a improvement. And yes, some guidelines/instructions have the potential to improve the questions and the "escalation" process. On 7 November 2016 at 10:48, <ioannis.deligian...@nomura.com> wrote: > My two cents (As a user/consumer)… > > > > I have been following & using Spark in financial services before version 1 > and before it migrated questions from Google Groups to apache mailing lists > (which was a shame L ). > > > > SO: > > There has been some momentum lately on SO, but as questions were not > “monitored/answered” by Spark experts, the motivation of posting a question > was low and in turn the quality of questions as well. As most of us know, > SO is usually the first place to look for info and can greatly reduce the > need to turn to user/dev groups so it would be great if there was more > attention to it. > > > > Spark mailing lists: > > As the consensus appears to be, questions tend to get lost if not > picked-up within 1-2 days. Re-sending the same question feels “abusive” to > me so would then give up. Provided that a good question takes time, putting > effort in a question that can easily be ignored results to mailing a “bad” > question (see what happens?) or no question at all. As you have probably > observed, a few users will mail a question to “dev” with “…no answers in > user list…” as they incorrectly assume that no-one can answer their > question. > > > > JIRA: > > I find that “issues” are being quite aggressively closed down. I’ve seen > this twice (one I reported myself and found the second ticket while looking > for a solution) and for this reason it doesn’t encourage users spending the > time and effort to use. Personally, I also feel that there is some bias on > what is in-scope and out-of-scope. > > > > My preference would be that SO would be the first place that someone would > post a question. If a few “experts” are found regularly answering > questions, eventually Spark users will start using it more and reduce > “user” load by easily finding previous answers (or SO community marking a > duplicates). The same “experts” can also encourage users to “escalate” to > JIRA, dev/user groups once a question has been properly filtered which is > quite common. > > > > PS. Personally, I would not follow any “bespoke/external” process on SO > E.g. down-voting on SO for any other reason that being a bad question as > per SO rules. > > > > > > *From:* Matei Zaharia [mailto:matei.zaha...@gmail.com] > *Sent:* 07 November 2016 07:45 > *To:* assaf.mendelson > *Cc:* dev@spark.apache.org > > *Subject:* Re: Handling questions in the mailing lists > > > > Even for the mailing list, I'd love to have a short set of instructions on > how to submit your questions (maybe on http://spark.apache.org/ > community.html > <https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.apache.org_community.html=DQMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho=MIQDl3ZflIuyNs62JLog9_vi0dD4xyo96x2w7XwGV3w=> > or maybe in the welcome email when you subscribe). It would be great if > someone added that. After all, we have such instructions for contributing > PRs, for example. > > > > Matei > > > > On Nov 6, 2016, at 11:09 PM, assaf.mendelson <assaf.mendel...@rsa.com> > wrote: > > > > There are other options as well. For example hosting an answerhub ( > www.answerhub.com > <https://urldefense.proofpoint.com/v2/url?u=http-3A__www.answerhub.com_=DQMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho=2oSovyR4k9m576OtymFnf4nQ4Xksk94HX543bDeEVQI=>) > or other similar separate Q service. > > BTW, I believe the main issue is not how opinionated people are but who is > answering questions. > > Today there are already people asking (and getting answers) on SO > (including myself). The problem is that many people do not go to SO. > > The problem I see is how to “bump” up questions which are not being > answered to someone more likely to be able to answer them. Simple questions > can be answered by many people, many of them even newbies who ran into the > issue themselves. > > The main issue is that the more complex the question, the less people > there are who can answer it and those people’s bandwidth is already clogged > by other questions. > > We could for example try to create tags on SO for “basic questions”, > “medium”, “advanced”. Provide guidelines to ask first on basic, if not > answered after X da
RE: Handling questions in the mailing lists
My two cents (As a user/consumer)… I have been following & using Spark in financial services before version 1 and before it migrated questions from Google Groups to apache mailing lists (which was a shame ☹ ). SO: There has been some momentum lately on SO, but as questions were not “monitored/answered” by Spark experts, the motivation of posting a question was low and in turn the quality of questions as well. As most of us know, SO is usually the first place to look for info and can greatly reduce the need to turn to user/dev groups so it would be great if there was more attention to it. Spark mailing lists: As the consensus appears to be, questions tend to get lost if not picked-up within 1-2 days. Re-sending the same question feels “abusive” to me so would then give up. Provided that a good question takes time, putting effort in a question that can easily be ignored results to mailing a “bad” question (see what happens?) or no question at all. As you have probably observed, a few users will mail a question to “dev” with “…no answers in user list…” as they incorrectly assume that no-one can answer their question. JIRA: I find that “issues” are being quite aggressively closed down. I’ve seen this twice (one I reported myself and found the second ticket while looking for a solution) and for this reason it doesn’t encourage users spending the time and effort to use. Personally, I also feel that there is some bias on what is in-scope and out-of-scope. My preference would be that SO would be the first place that someone would post a question. If a few “experts” are found regularly answering questions, eventually Spark users will start using it more and reduce “user” load by easily finding previous answers (or SO community marking a duplicates). The same “experts” can also encourage users to “escalate” to JIRA, dev/user groups once a question has been properly filtered which is quite common. PS. Personally, I would not follow any “bespoke/external” process on SO E.g. down-voting on SO for any other reason that being a bad question as per SO rules. From: Matei Zaharia [mailto:matei.zaha...@gmail.com] Sent: 07 November 2016 07:45 To: assaf.mendelson Cc: dev@spark.apache.org Subject: Re: Handling questions in the mailing lists Even for the mailing list, I'd love to have a short set of instructions on how to submit your questions (maybe on http://spark.apache.org/community.html<https://urldefense.proofpoint.com/v2/url?u=http-3A__spark.apache.org_community.html=DQMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho=MIQDl3ZflIuyNs62JLog9_vi0dD4xyo96x2w7XwGV3w=> or maybe in the welcome email when you subscribe). It would be great if someone added that. After all, we have such instructions for contributing PRs, for example. Matei On Nov 6, 2016, at 11:09 PM, assaf.mendelson <assaf.mendel...@rsa.com<mailto:assaf.mendel...@rsa.com>> wrote: There are other options as well. For example hosting an answerhub (www.answerhub.com<https://urldefense.proofpoint.com/v2/url?u=http-3A__www.answerhub.com_=DQMFaQ=dCBwIlVXJsYZrY6gpNt0LA=B8E4n9FrSS85mPCi6Mfs7cyEPQnVrpcQ1zeB-JKws6A=Vf-yZoTpLgwZzwUCoQTMr4UFD_R0nx0naxh_SWUHfho=2oSovyR4k9m576OtymFnf4nQ4Xksk94HX543bDeEVQI=>) or other similar separate Q service. BTW, I believe the main issue is not how opinionated people are but who is answering questions. Today there are already people asking (and getting answers) on SO (including myself). The problem is that many people do not go to SO. The problem I see is how to “bump” up questions which are not being answered to someone more likely to be able to answer them. Simple questions can be answered by many people, many of them even newbies who ran into the issue themselves. The main issue is that the more complex the question, the less people there are who can answer it and those people’s bandwidth is already clogged by other questions. We could for example try to create tags on SO for “basic questions”, “medium”, “advanced”. Provide guidelines to ask first on basic, if not answered after X days then add the medium tag etc. Downvote people who don’t go by the process. This would mean that committers for example can look at advanced only tag and have a manageable number of questions they can help with while others can answer medium and basic. I agree that some things are not good for SO. Basically stuff which asks for opinion is such but most cases in the mailing list are either “how do I solve this bug” or “how do I do X”. Either of those two are good for SO. Assaf. From: rxin [via Apache Spark Developers List] [mailto:ml-node+[hidden email]] Sent: Monday, November 07, 2016 8:33 AM To: Mendelson, Assaf Subject: Re: Handling questions in the mailing lists This is an excellent point. If we do go ahead and feature SO as a way for users to ask questions more prominently, as someone who k
Re: Handling questions in the mailing lists
Even for the mailing list, I'd love to have a short set of instructions on how to submit your questions (maybe on http://spark.apache.org/community.html or maybe in the welcome email when you subscribe). It would be great if someone added that. After all, we have such instructions for contributing PRs, for example. Matei > On Nov 6, 2016, at 11:09 PM, assaf.mendelson <assaf.mendel...@rsa.com> wrote: > > There are other options as well. For example hosting an answerhub > (www.answerhub.com <http://www.answerhub.com/>) or other similar separate Q > service. > > BTW, I believe the main issue is not how opinionated people are but who is > answering questions. > > Today there are already people asking (and getting answers) on SO (including > myself). The problem is that many people do not go to SO. > > The problem I see is how to “bump” up questions which are not being answered > to someone more likely to be able to answer them. Simple questions can be > answered by many people, many of them even newbies who ran into the issue > themselves. > > The main issue is that the more complex the question, the less people there > are who can answer it and those people’s bandwidth is already clogged by > other questions. > > We could for example try to create tags on SO for “basic questions”, > “medium”, “advanced”. Provide guidelines to ask first on basic, if not > answered after X days then add the medium tag etc. Downvote people who don’t > go by the process. This would mean that committers for example can look at > advanced only tag and have a manageable number of questions they can help > with while others can answer medium and basic. > > > > I agree that some things are not good for SO. Basically stuff which asks for > opinion is such but most cases in the mailing list are either “how do I solve > this bug” or “how do I do X”. Either of those two are good for SO. > > > > > > Assaf. > > > > > > > > From: rxin [via Apache Spark Developers List] [mailto:ml-node+[hidden email] > ] > Sent: Monday, November 07, 2016 8:33 AM > To: Mendelson, Assaf > Subject: Re: Handling questions in the mailing lists > > > > This is an excellent point. If we do go ahead and feature SO as a way for > users to ask questions more prominently, as someone who knows SO very well, > would you be willing to help write a short guideline (ideally the shorter the > better, which makes it hard) to direct what goes to user@ and what goes to SO? > > > > > > On Sun, Nov 6, 2016 at 9:54 PM, Maciej Szymkiewicz <[hidden email] > > wrote: > > Damn, I always thought that mailing list is only for nice and welcoming > people and there is nothing to do for me here >:) > > To be serious though, there are many questions on the users list which would > fit just fine on SO but it is not true in general. There are dozens of > questions which are to broad, opinion based, ask for external resources and > so on. If you want to direct users to SO you have to help them to decide if > it is the right channel. Otherwise it will just create a really bad > experience for both seeking help and active answerers. Former ones will be > downvoted and bashed, latter ones will have to deal with handling all the > junk and the number of active Spark users with moderation privileges is > really low (with only Massg and me being able to directly close duplicates). > > Believe me, I've seen this before. > > On 11/07/2016 05:08 AM, Reynold Xin wrote: > > You have substantially underestimated how opinionated people can be on > mailing lists too :) > > On Sunday, November 6, 2016, Maciej Szymkiewicz <[hidden email] > > wrote: > > You have to remember that Stack Overflow crowd (like me) is highly > opinionated, so many questions, which could be just fine on the mailing list, > will be quickly downvoted and / or closed as off-topic. Just saying... > > -- > Best, > Maciej > > > On 11/07/2016 04:03 AM, Reynold Xin wrote: > > OK I've checked on the ASF member list (which is private so there is no > public archive). > > > > It is not against any ASF rule to recommend StackOverflow as a place for > users to ask questions. I don't think we can or should delete the existing > user@spark list either, but we can certainly make SO more visible than it is. > > > > > > > > On Wed, Nov 2, 2016 at 10:21 AM, Reynold Xin <[hidden email] > > wrote: > > Actually after talking with more ASF members, I believe the only policy is > that development decisions have to be made and announced on ASF properties >
Re: Handling questions in the mailing lists
verflow). >>>>> >>>>> As usually, Sean and Cody's contributions are very to the point. >>>>> I fell it is indeed a matter of of culture (hard to enforce) and tools >>>>> (much easier). Isn't it? >>>>> >>>>> On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org> >>>>> wrote: >>>>> >>>>>> So concrete things people could do >>>>>> >>>>>> - users could tag subject lines appropriately to the component they're >>>>>> asking about >>>>>> >>>>>> - contributors could monitor user@ for tags relating to components >>>>>> they've worked on. >>>>>> I'd be surprised if my miss rate for any mailing list questions >>>>>> well-labeled as Kafka was higher than 5% >>>>>> >>>>>> - committers could be more aggressive about soliciting and merging PRs >>>>>> to improve documentation. >>>>>> It's a lot easier to answer even poorly-asked questions with a link to >>>>>> relevant docs. >>>>>> >>>>>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote: >>>>>> > There's already reviews@ and issues@. dev@ is for project >>>>>> development itself >>>>>> > and I think is OK. You're suggesting splitting up user@ and I >>>>>> sympathize >>>>>> > with the motivation. Experience tells me that we'll have a beginner@ >>>>>> that's >>>>>> > then totally ignored, and people will quickly learn to post to >>>>>> advanced@ to >>>>>> > get attention, and we'll be back where we started. Putting it in >>>>>> JIRA >>>>>> > doesn't help. I don't think this a problem that is merely down to >>>>>> lack of >>>>>> > process. It actually requires cultivating a culture change on the >>>>>> community >>>>>> > list. >>>>>> > >>>>>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < >>>>>> assaf.mendel...@rsa.com> >>>>>> > wrote: >>>>>> >> >>>>>> >> What I am suggesting is basically to fix that. >>>>>> >> >>>>>> >> For example, we might say that mailing list A is only for voting, >>>>>> mailing >>>>>> >> list B is only for PR and have something like stack overflow for >>>>>> developer >>>>>> >> questions (I would even go as far as to have beginner, >>>>>> intermediate and >>>>>> >> advanced mailing list for users and beginner/advanced for dev). >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> This can easily be done using stack overflow tags, however, that >>>>>> would >>>>>> >> probably be harder to manage. >>>>>> >> >>>>>> >> Maybe using special jira tags and manage it in jira? >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> Anyway as I said, the main issue is not user questions (except >>>>>> maybe >>>>>> >> advanced ones) but more for dev questions. It is so easy to get >>>>>> lost in the >>>>>> >> chatter that it makes it very hard for people to learn spark >>>>>> internals… >>>>>> >> >>>>>> >> Assaf. >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> From: Sean Owen [mailto:so...@cloudera.com] >>>>>> >> Sent: Wednesday, November 02, 2016 2:07 PM >>>>>> >> To: Mendelson, Assaf; dev@spark.apache.org >>>>>> >> Subject: Re: Handling questions in the mailing lists >>>>>> >> >>>>>> >> >>>>>> >> >>>>>> >> I think that unfortunately mailing lists don't scale well. This >>>>>> one has >>>>>> >> thousands of subscribers with different interests and levels of >>>>>> experience. >>>>>> >> F
Re: Handling questions in the mailing lists
is indeed a matter of of culture (hard to >> enforce) and tools (much easier). Isn't it? >> >> On 2 November 2016 at 16:36, Cody Koeninger >> <c...@koeninger.org> wrote: >> >> So concrete things people could do >> >> - users could tag subject lines appropriately to >> the component they're >> asking about >> >> - contributors could monitor user@ for tags >> relating to components >> they've worked on. >> I'd be surprised if my miss rate for any mailing >> list questions >> well-labeled as Kafka was higher than 5% >> >> - committers could be more aggressive about >> soliciting and merging PRs >> to improve documentation. >> It's a lot easier to answer even poorly-asked >> questions with a link to >> relevant docs. >> >> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen >> <so...@cloudera.com> wrote: >> > There's already reviews@ and issues@. dev@ is >> for project development itself >> > and I think is OK. You're suggesting splitting >> up user@ and I sympathize >> > with the motivation. Experience tells me that >> we'll have a beginner@ that's >> > then totally ignored, and people will quickly >> learn to post to advanced@ to >> > get attention, and we'll be back where we >> started. Putting it in JIRA >> > doesn't help. I don't think this a problem that >> is merely down to lack of >> > process. It actually requires cultivating a >> culture change on the community >> > list. >> > >> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, >> Assaf <assaf.mendel...@rsa.com> >> > wrote: >> >> >> >> What I am suggesting is basically to fix that. >> >> >> >> For example, we might say that mailing list A >> is only for voting, mailing >> >> list B is only for PR and have something like >> stack overflow for developer >> >> questions (I would even go as far as to have >> beginner, intermediate and >> >> advanced mailing list for users and >> beginner/advanced for dev). >> >> >> >> >> >> >> >> This can easily be done using stack overflow >> tags, however, that would >> >> probably be harder to manage. >> >> >> >> Maybe using special jira tags and manage it in >> jira? >> >> >> >> >> >> >> >> Anyway as I said, the main issue is not user >> questions (except maybe >> >> advanced ones) but more for dev questions. It >> is so easy to get lost in the >> >> chatter that it makes it very hard for people >> to learn spark internals… >> >> >> >> Assaf. >> >> >> >> >> >> >> >> From: Sean Owen [mailto:so...@cloudera.com] >> >> Sent: Wednesday, November 02, 2016 2:07 PM >> >> To: Mendelson, Assaf; dev@spark.apache.org >> >> Subject: Re: Handling questions in the mailing >> lists >> >> >> >> >>
Re: Handling questions in the mailing lists
where we started. > Putting it in JIRA > > doesn't help. I don't think this a problem that is > merely down to lack of > > process. It actually requires cultivating a culture > change on the community > > list. > > > > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf > <assaf.mendel...@rsa.com> > > wrote: > >> > >> What I am suggesting is basically to fix that. > >> > >> For example, we might say that mailing list A is > only for voting, mailing > >> list B is only for PR and have something like stack > overflow for developer > >> questions (I would even go as far as to have > beginner, intermediate and > >> advanced mailing list for users and > beginner/advanced for dev). > >> > >> > >> > >> This can easily be done using stack overflow tags, > however, that would > >> probably be harder to manage. > >> > >> Maybe using special jira tags and manage it in jira? > >> > >> > >> > >> Anyway as I said, the main issue is not user > questions (except maybe > >> advanced ones) but more for dev questions. It is so > easy to get lost in the > >> chatter that it makes it very hard for people to > learn spark internals… > >> > >> Assaf. > >> > >> > >> > >> From: Sean Owen [mailto:so...@cloudera.com] > >> Sent: Wednesday, November 02, 2016 2:07 PM > >> To: Mendelson, Assaf; dev@spark.apache.org > >> Subject: Re: Handling questions in the mailing lists > >> > >> > >> > >> I think that unfortunately mailing lists don't > scale well. This one has > >> thousands of subscribers with different interests > and levels of experience. > >> For any given person, most messages will be > irrelevant. I also find that a > >> lot of questions on user@ are not well-asked, > aren't an SSCCE > >> (http://sscce.org/), not something most people are > going to bother replying > >> to even if they could answer. I almost entirely > ignore user@ because there > >> are higher-priority channels like PRs to deal with, > that already have > >> hundreds of messages per day. This is why little of > it gets an answer -- too > >> noisy. > >> > >> > >> > >> We have to have official mailing lists, in any > event, to have some > >> official channel for things like votes and > announcements. It's not wrong to > >> ask questions on user@ of course, but a lot of the > questions I see could > >> have been answered with research of existing docs > or looking at the code. I > >> think that given the scale of the list, it's not > wrong to assert that this > >> is sort of a prerequisite for asking thousands of > people to answer one's > >> question. But we can't enforce that. > >> > >> > >> > >> The situation will get better to the extent people > ask better questions, > >> help other people ask better questions, and answer > good questions. I'd > >> encourage anyone feeling this way to try to help > along those dimensions. > >> > >> >
Re: Handling questions in the mailing lists
>>> mailing >>>> >> list B is only for PR and have something like stack overflow for >>>> developer >>>> >> questions (I would even go as far as to have beginner, intermediate >>>> and >>>> >> advanced mailing list for users and beginner/advanced for dev). >>>> >> >>>> >> >>>> >> >>>> >> This can easily be done using stack overflow tags, however, that >>>> would >>>> >> probably be harder to manage. >>>> >> >>>> >> Maybe using special jira tags and manage it in jira? >>>> >> >>>> >> >>>> >> >>>> >> Anyway as I said, the main issue is not user questions (except maybe >>>> >> advanced ones) but more for dev questions. It is so easy to get lost >>>> in the >>>> >> chatter that it makes it very hard for people to learn spark >>>> internals… >>>> >> >>>> >> Assaf. >>>> >> >>>> >> >>>> >> >>>> >> From: Sean Owen [mailto:so...@cloudera.com] >>>> >> Sent: Wednesday, November 02, 2016 2:07 PM >>>> >> To: Mendelson, Assaf; dev@spark.apache.org >>>> >> Subject: Re: Handling questions in the mailing lists >>>> >> >>>> >> >>>> >> >>>> >> I think that unfortunately mailing lists don't scale well. This one >>>> has >>>> >> thousands of subscribers with different interests and levels of >>>> experience. >>>> >> For any given person, most messages will be irrelevant. I also find >>>> that a >>>> >> lot of questions on user@ are not well-asked, aren't an SSCCE >>>> >> (http://sscce.org/), not something most people are going to bother >>>> replying >>>> >> to even if they could answer. I almost entirely ignore user@ >>>> because there >>>> >> are higher-priority channels like PRs to deal with, that already have >>>> >> hundreds of messages per day. This is why little of it gets an >>>> answer -- too >>>> >> noisy. >>>> >> >>>> >> >>>> >> >>>> >> We have to have official mailing lists, in any event, to have some >>>> >> official channel for things like votes and announcements. It's not >>>> wrong to >>>> >> ask questions on user@ of course, but a lot of the questions I see >>>> could >>>> >> have been answered with research of existing docs or looking at the >>>> code. I >>>> >> think that given the scale of the list, it's not wrong to assert >>>> that this >>>> >> is sort of a prerequisite for asking thousands of people to answer >>>> one's >>>> >> question. But we can't enforce that. >>>> >> >>>> >> >>>> >> >>>> >> The situation will get better to the extent people ask better >>>> questions, >>>> >> help other people ask better questions, and answer good questions. >>>> I'd >>>> >> encourage anyone feeling this way to try to help along those >>>> dimensions. >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson < >>>> assaf.mendel...@rsa.com> >>>> >> wrote: >>>> >> >>>> >> Hi, >>>> >> >>>> >> I know this is a little off topic but I wanted to raise an issue >>>> about >>>> >> handling questions in the mailing list (this is true both for the >>>> user >>>> >> mailing list and the dev but since there are other options such as >>>> stack >>>> >> overflow for user questions, this is more problematic in dev). >>>> >> >>>> >> Let’s say I ask a question (as I recently did). Unfortunately this >>>> was >>>> >> during spark summit in Europe so probably people were busy. In any >>>> case no >>>> >> one answered. >>>> >> >>>> >> The problem is, that if no one answers very soon, the question will >>>> almost >>>> >> certainly remain unanswered because new messages will simply drown >>>> it. >>>> >> >>>> >> >>>> >> >>>> >> This is a common issue not just for questions but for any comment or >>>> idea >>>> >> which is not immediately picked up. >>>> >> >>>> >> >>>> >> >>>> >> I believe we should have a method of handling this. >>>> >> >>>> >> Generally, I would say these types of things belong in stack >>>> overflow, >>>> >> after all, the way it is built is perfect for this. More seasoned >>>> spark >>>> >> contributors and committers can periodically check out unanswered >>>> questions >>>> >> and answer them. >>>> >> >>>> >> The problem is that stack overflow (as well as other targets such as >>>> the >>>> >> databricks forums) tend to have a more user based orientation. This >>>> means >>>> >> that any spark internal question will almost certainly remain >>>> unanswered. >>>> >> >>>> >> >>>> >> >>>> >> I was wondering if we could come up with a solution for this. >>>> >> >>>> >> >>>> >> >>>> >> Assaf. >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> >>>> >> View this message in context: Handling questions in the mailing lists >>>> >> Sent from the Apache Spark Developers List mailing list archive at >>>> >> Nabble.com. >>>> >>>> - >>>> To unsubscribe e-mail: dev-unsubscr...@spark.apache.org >>>> >>>> >>>
Re: Handling questions in the mailing lists
Actually after talking with more ASF members, I believe the only policy is that development decisions have to be made and announced on ASF properties (dev list or jira), but user questions don't have to. I'm going to double check this. If it is true, I would actually recommend us moving entirely over the Q part of the user list to stackoverflow, or at least make that the recommended way rather than the existing user list which is not very scalable. On Wednesday, November 2, 2016, Nicholas Chammas <nicholas.cham...@gmail.com> wrote: > We’ve discussed several times upgrading our communication tools, as far > back as 2014 and maybe even before that too. The bottom line is that we > can’t due to ASF rules requiring the use of ASF-managed mailing lists. > > For some history, see this discussion: > >- https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/% >3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E > > <https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E> >- https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/% >3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E > > <https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E> > > (It’s ironic that it’s difficult to follow the past discussion on why we > can’t change our official communication tools due to those very tools…) > > Nick > > > On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida < > ricardo.alme...@actnowib.com > <javascript:_e(%7B%7D,'cvml','ricardo.alme...@actnowib.com');>> wrote: > >> I fell Assaf point is quite relevant if we want to move this project >> forward from the Spark user perspective (as I do). In fact, we're still >> using 20th century tools (mailing lists) with some add-ons (like Stack >> Overflow). >> >> As usually, Sean and Cody's contributions are very to the point. >> I fell it is indeed a matter of of culture (hard to enforce) and tools >> (much easier). Isn't it? >> >> On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org >> <javascript:_e(%7B%7D,'cvml','c...@koeninger.org');>> wrote: >> >>> So concrete things people could do >>> >>> - users could tag subject lines appropriately to the component they're >>> asking about >>> >>> - contributors could monitor user@ for tags relating to components >>> they've worked on. >>> I'd be surprised if my miss rate for any mailing list questions >>> well-labeled as Kafka was higher than 5% >>> >>> - committers could be more aggressive about soliciting and merging PRs >>> to improve documentation. >>> It's a lot easier to answer even poorly-asked questions with a link to >>> relevant docs. >>> >>> On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com >>> <javascript:_e(%7B%7D,'cvml','so...@cloudera.com');>> wrote: >>> > There's already reviews@ and issues@. dev@ is for project development >>> itself >>> > and I think is OK. You're suggesting splitting up user@ and I >>> sympathize >>> > with the motivation. Experience tells me that we'll have a beginner@ >>> that's >>> > then totally ignored, and people will quickly learn to post to >>> advanced@ to >>> > get attention, and we'll be back where we started. Putting it in JIRA >>> > doesn't help. I don't think this a problem that is merely down to lack >>> of >>> > process. It actually requires cultivating a culture change on the >>> community >>> > list. >>> > >>> > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < >>> assaf.mendel...@rsa.com >>> <javascript:_e(%7B%7D,'cvml','assaf.mendel...@rsa.com');>> >>> > wrote: >>> >> >>> >> What I am suggesting is basically to fix that. >>> >> >>> >> For example, we might say that mailing list A is only for voting, >>> mailing >>> >> list B is only for PR and have something like stack overflow for >>> developer >>> >> questions (I would even go as far as to have beginner, intermediate >>> and >>> >> advanced mailing list for users and beginner/advanced for dev). >>> >> >>> >> >>> >> >>> >> This can easily be done using stack overflow tags, however, that would >>>
Re: Handling questions in the mailing lists
We’ve discussed several times upgrading our communication tools, as far back as 2014 and maybe even before that too. The bottom line is that we can’t due to ASF rules requiring the use of ASF-managed mailing lists. For some history, see this discussion: - https://mail-archives.apache.org/mod_mbox/spark-user/201412.mbox/%3CCAOhmDzfL2COdysV8r5hZN8f=NqXM=f=oy5no2dhwj_kveop...@mail.gmail.com%3E - https://mail-archives.apache.org/mod_mbox/spark-user/201501.mbox/%3CCAOhmDzec1JdsXQq3dDwAv7eLnzRidSkrsKKG0xKw=tktxy_...@mail.gmail.com%3E (It’s ironic that it’s difficult to follow the past discussion on why we can’t change our official communication tools due to those very tools…) Nick On Wed, Nov 2, 2016 at 12:24 PM Ricardo Almeida < ricardo.alme...@actnowib.com> wrote: > I fell Assaf point is quite relevant if we want to move this project > forward from the Spark user perspective (as I do). In fact, we're still > using 20th century tools (mailing lists) with some add-ons (like Stack > Overflow). > > As usually, Sean and Cody's contributions are very to the point. > I fell it is indeed a matter of of culture (hard to enforce) and tools > (much easier). Isn't it? > > On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org> wrote: > > So concrete things people could do > > - users could tag subject lines appropriately to the component they're > asking about > > - contributors could monitor user@ for tags relating to components > they've worked on. > I'd be surprised if my miss rate for any mailing list questions > well-labeled as Kafka was higher than 5% > > - committers could be more aggressive about soliciting and merging PRs > to improve documentation. > It's a lot easier to answer even poorly-asked questions with a link to > relevant docs. > > On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote: > > There's already reviews@ and issues@. dev@ is for project development > itself > > and I think is OK. You're suggesting splitting up user@ and I sympathize > > with the motivation. Experience tells me that we'll have a beginner@ > that's > > then totally ignored, and people will quickly learn to post to advanced@ > to > > get attention, and we'll be back where we started. Putting it in JIRA > > doesn't help. I don't think this a problem that is merely down to lack of > > process. It actually requires cultivating a culture change on the > community > > list. > > > > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < > assaf.mendel...@rsa.com> > > wrote: > >> > >> What I am suggesting is basically to fix that. > >> > >> For example, we might say that mailing list A is only for voting, > mailing > >> list B is only for PR and have something like stack overflow for > developer > >> questions (I would even go as far as to have beginner, intermediate and > >> advanced mailing list for users and beginner/advanced for dev). > >> > >> > >> > >> This can easily be done using stack overflow tags, however, that would > >> probably be harder to manage. > >> > >> Maybe using special jira tags and manage it in jira? > >> > >> > >> > >> Anyway as I said, the main issue is not user questions (except maybe > >> advanced ones) but more for dev questions. It is so easy to get lost in > the > >> chatter that it makes it very hard for people to learn spark internals… > >> > >> Assaf. > >> > >> > >> > >> From: Sean Owen [mailto:so...@cloudera.com] > >> Sent: Wednesday, November 02, 2016 2:07 PM > >> To: Mendelson, Assaf; dev@spark.apache.org > >> Subject: Re: Handling questions in the mailing lists > >> > >> > >> > >> I think that unfortunately mailing lists don't scale well. This one has > >> thousands of subscribers with different interests and levels of > experience. > >> For any given person, most messages will be irrelevant. I also find > that a > >> lot of questions on user@ are not well-asked, aren't an SSCCE > >> (http://sscce.org/), not something most people are going to bother > replying > >> to even if they could answer. I almost entirely ignore user@ because > there > >> are higher-priority channels like PRs to deal with, that already have > >> hundreds of messages per day. This is why little of it gets an answer > -- too > >> noisy. > >> > >> > >> > >> We have to have official mailing lists, in any event, to have some > >> official channel for things like votes and anno
Re: Handling questions in the mailing lists
I fell Assaf point is quite relevant if we want to move this project forward from the Spark user perspective (as I do). In fact, we're still using 20th century tools (mailing lists) with some add-ons (like Stack Overflow). As usually, Sean and Cody's contributions are very to the point. I fell it is indeed a matter of of culture (hard to enforce) and tools (much easier). Isn't it? On 2 November 2016 at 16:36, Cody Koeninger <c...@koeninger.org> wrote: > So concrete things people could do > > - users could tag subject lines appropriately to the component they're > asking about > > - contributors could monitor user@ for tags relating to components > they've worked on. > I'd be surprised if my miss rate for any mailing list questions > well-labeled as Kafka was higher than 5% > > - committers could be more aggressive about soliciting and merging PRs > to improve documentation. > It's a lot easier to answer even poorly-asked questions with a link to > relevant docs. > > On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote: > > There's already reviews@ and issues@. dev@ is for project development > itself > > and I think is OK. You're suggesting splitting up user@ and I sympathize > > with the motivation. Experience tells me that we'll have a beginner@ > that's > > then totally ignored, and people will quickly learn to post to advanced@ > to > > get attention, and we'll be back where we started. Putting it in JIRA > > doesn't help. I don't think this a problem that is merely down to lack of > > process. It actually requires cultivating a culture change on the > community > > list. > > > > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf < > assaf.mendel...@rsa.com> > > wrote: > >> > >> What I am suggesting is basically to fix that. > >> > >> For example, we might say that mailing list A is only for voting, > mailing > >> list B is only for PR and have something like stack overflow for > developer > >> questions (I would even go as far as to have beginner, intermediate and > >> advanced mailing list for users and beginner/advanced for dev). > >> > >> > >> > >> This can easily be done using stack overflow tags, however, that would > >> probably be harder to manage. > >> > >> Maybe using special jira tags and manage it in jira? > >> > >> > >> > >> Anyway as I said, the main issue is not user questions (except maybe > >> advanced ones) but more for dev questions. It is so easy to get lost in > the > >> chatter that it makes it very hard for people to learn spark internals… > >> > >> Assaf. > >> > >> > >> > >> From: Sean Owen [mailto:so...@cloudera.com] > >> Sent: Wednesday, November 02, 2016 2:07 PM > >> To: Mendelson, Assaf; dev@spark.apache.org > >> Subject: Re: Handling questions in the mailing lists > >> > >> > >> > >> I think that unfortunately mailing lists don't scale well. This one has > >> thousands of subscribers with different interests and levels of > experience. > >> For any given person, most messages will be irrelevant. I also find > that a > >> lot of questions on user@ are not well-asked, aren't an SSCCE > >> (http://sscce.org/), not something most people are going to bother > replying > >> to even if they could answer. I almost entirely ignore user@ because > there > >> are higher-priority channels like PRs to deal with, that already have > >> hundreds of messages per day. This is why little of it gets an answer > -- too > >> noisy. > >> > >> > >> > >> We have to have official mailing lists, in any event, to have some > >> official channel for things like votes and announcements. It's not > wrong to > >> ask questions on user@ of course, but a lot of the questions I see > could > >> have been answered with research of existing docs or looking at the > code. I > >> think that given the scale of the list, it's not wrong to assert that > this > >> is sort of a prerequisite for asking thousands of people to answer one's > >> question. But we can't enforce that. > >> > >> > >> > >> The situation will get better to the extent people ask better questions, > >> help other people ask better questions, and answer good questions. I'd > >> encourage anyone feeling this way to try to help along those dimensions. > >> > >> > >> > >> > >> > >> > >
Re: Handling questions in the mailing lists
So concrete things people could do - users could tag subject lines appropriately to the component they're asking about - contributors could monitor user@ for tags relating to components they've worked on. I'd be surprised if my miss rate for any mailing list questions well-labeled as Kafka was higher than 5% - committers could be more aggressive about soliciting and merging PRs to improve documentation. It's a lot easier to answer even poorly-asked questions with a link to relevant docs. On Wed, Nov 2, 2016 at 7:39 AM, Sean Owen <so...@cloudera.com> wrote: > There's already reviews@ and issues@. dev@ is for project development itself > and I think is OK. You're suggesting splitting up user@ and I sympathize > with the motivation. Experience tells me that we'll have a beginner@ that's > then totally ignored, and people will quickly learn to post to advanced@ to > get attention, and we'll be back where we started. Putting it in JIRA > doesn't help. I don't think this a problem that is merely down to lack of > process. It actually requires cultivating a culture change on the community > list. > > On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <assaf.mendel...@rsa.com> > wrote: >> >> What I am suggesting is basically to fix that. >> >> For example, we might say that mailing list A is only for voting, mailing >> list B is only for PR and have something like stack overflow for developer >> questions (I would even go as far as to have beginner, intermediate and >> advanced mailing list for users and beginner/advanced for dev). >> >> >> >> This can easily be done using stack overflow tags, however, that would >> probably be harder to manage. >> >> Maybe using special jira tags and manage it in jira? >> >> >> >> Anyway as I said, the main issue is not user questions (except maybe >> advanced ones) but more for dev questions. It is so easy to get lost in the >> chatter that it makes it very hard for people to learn spark internals… >> >> Assaf. >> >> >> >> From: Sean Owen [mailto:so...@cloudera.com] >> Sent: Wednesday, November 02, 2016 2:07 PM >> To: Mendelson, Assaf; dev@spark.apache.org >> Subject: Re: Handling questions in the mailing lists >> >> >> >> I think that unfortunately mailing lists don't scale well. This one has >> thousands of subscribers with different interests and levels of experience. >> For any given person, most messages will be irrelevant. I also find that a >> lot of questions on user@ are not well-asked, aren't an SSCCE >> (http://sscce.org/), not something most people are going to bother replying >> to even if they could answer. I almost entirely ignore user@ because there >> are higher-priority channels like PRs to deal with, that already have >> hundreds of messages per day. This is why little of it gets an answer -- too >> noisy. >> >> >> >> We have to have official mailing lists, in any event, to have some >> official channel for things like votes and announcements. It's not wrong to >> ask questions on user@ of course, but a lot of the questions I see could >> have been answered with research of existing docs or looking at the code. I >> think that given the scale of the list, it's not wrong to assert that this >> is sort of a prerequisite for asking thousands of people to answer one's >> question. But we can't enforce that. >> >> >> >> The situation will get better to the extent people ask better questions, >> help other people ask better questions, and answer good questions. I'd >> encourage anyone feeling this way to try to help along those dimensions. >> >> >> >> >> >> >> >> >> >> >> >> On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <assaf.mendel...@rsa.com> >> wrote: >> >> Hi, >> >> I know this is a little off topic but I wanted to raise an issue about >> handling questions in the mailing list (this is true both for the user >> mailing list and the dev but since there are other options such as stack >> overflow for user questions, this is more problematic in dev). >> >> Let’s say I ask a question (as I recently did). Unfortunately this was >> during spark summit in Europe so probably people were busy. In any case no >> one answered. >> >> The problem is, that if no one answers very soon, the question will almost >> certainly remain unanswered because new messages will simply drown it. >> >> >> >> This is a common issue not just for questions but for any comment or idea >> which is not immediately p
Re: Handling questions in the mailing lists
There's already reviews@ and issues@. dev@ is for project development itself and I think is OK. You're suggesting splitting up user@ and I sympathize with the motivation. Experience tells me that we'll have a beginner@ that's then totally ignored, and people will quickly learn to post to advanced@ to get attention, and we'll be back where we started. Putting it in JIRA doesn't help. I don't think this a problem that is merely down to lack of process. It actually requires cultivating a culture change on the community list. On Wed, Nov 2, 2016 at 12:11 PM Mendelson, Assaf <assaf.mendel...@rsa.com> wrote: > What I am suggesting is basically to fix that. > > For example, we might say that mailing list A is only for voting, mailing > list B is only for PR and have something like stack overflow for developer > questions (I would even go as far as to have beginner, intermediate and > advanced mailing list for users and beginner/advanced for dev). > > > > This can easily be done using stack overflow tags, however, that would > probably be harder to manage. > > Maybe using special jira tags and manage it in jira? > > > > Anyway as I said, the main issue is not user questions (except maybe > advanced ones) but more for dev questions. It is so easy to get lost in the > chatter that it makes it very hard for people to learn spark internals… > > Assaf. > > > > *From:* Sean Owen [mailto:so...@cloudera.com] > *Sent:* Wednesday, November 02, 2016 2:07 PM > *To:* Mendelson, Assaf; dev@spark.apache.org > *Subject:* Re: Handling questions in the mailing lists > > > > I think that unfortunately mailing lists don't scale well. This one has > thousands of subscribers with different interests and levels of experience. > For any given person, most messages will be irrelevant. I also find that a > lot of questions on user@ are not well-asked, aren't an SSCCE ( > http://sscce.org/), not something most people are going to bother > replying to even if they could answer. I almost entirely ignore user@ > because there are higher-priority channels like PRs to deal with, that > already have hundreds of messages per day. This is why little of it gets an > answer -- too noisy. > > > > We have to have official mailing lists, in any event, to have some > official channel for things like votes and announcements. It's not wrong to > ask questions on user@ of course, but a lot of the questions I see could > have been answered with research of existing docs or looking at the code. I > think that given the scale of the list, it's not wrong to assert that this > is sort of a prerequisite for asking thousands of people to answer one's > question. But we can't enforce that. > > > > The situation will get better to the extent people ask better questions, > help other people ask better questions, and answer good questions. I'd > encourage anyone feeling this way to try to help along those dimensions. > > > > > > > > > > > > On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <assaf.mendel...@rsa.com> > wrote: > > Hi, > > I know this is a little off topic but I wanted to raise an issue about > handling questions in the mailing list (this is true both for the user > mailing list and the dev but since there are other options such as stack > overflow for user questions, this is more problematic in dev). > > Let’s say I ask a question (as I recently did). Unfortunately this was > during spark summit in Europe so probably people were busy. In any case no > one answered. > > The problem is, that if no one answers very soon, the question will almost > certainly remain unanswered because new messages will simply drown it. > > > > This is a common issue not just for questions but for any comment or idea > which is not immediately picked up. > > > > I believe we should have a method of handling this. > > Generally, I would say these types of things belong in stack overflow, > after all, the way it is built is perfect for this. More seasoned spark > contributors and committers can periodically check out unanswered questions > and answer them. > > The problem is that stack overflow (as well as other targets such as the > databricks forums) tend to have a more user based orientation. This means > that any spark internal question will almost certainly remain unanswered. > > > > I was wondering if we could come up with a solution for this. > > > > Assaf. > > > > > -- > > View this message in context: Handling questions in the mailing lists > <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. > >
RE: Handling questions in the mailing lists
What I am suggesting is basically to fix that. For example, we might say that mailing list A is only for voting, mailing list B is only for PR and have something like stack overflow for developer questions (I would even go as far as to have beginner, intermediate and advanced mailing list for users and beginner/advanced for dev). This can easily be done using stack overflow tags, however, that would probably be harder to manage. Maybe using special jira tags and manage it in jira? Anyway as I said, the main issue is not user questions (except maybe advanced ones) but more for dev questions. It is so easy to get lost in the chatter that it makes it very hard for people to learn spark internals… Assaf. From: Sean Owen [mailto:so...@cloudera.com] Sent: Wednesday, November 02, 2016 2:07 PM To: Mendelson, Assaf; dev@spark.apache.org Subject: Re: Handling questions in the mailing lists I think that unfortunately mailing lists don't scale well. This one has thousands of subscribers with different interests and levels of experience. For any given person, most messages will be irrelevant. I also find that a lot of questions on user@ are not well-asked, aren't an SSCCE (http://sscce.org/), not something most people are going to bother replying to even if they could answer. I almost entirely ignore user@ because there are higher-priority channels like PRs to deal with, that already have hundreds of messages per day. This is why little of it gets an answer -- too noisy. We have to have official mailing lists, in any event, to have some official channel for things like votes and announcements. It's not wrong to ask questions on user@ of course, but a lot of the questions I see could have been answered with research of existing docs or looking at the code. I think that given the scale of the list, it's not wrong to assert that this is sort of a prerequisite for asking thousands of people to answer one's question. But we can't enforce that. The situation will get better to the extent people ask better questions, help other people ask better questions, and answer good questions. I'd encourage anyone feeling this way to try to help along those dimensions. On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <assaf.mendel...@rsa.com<mailto:assaf.mendel...@rsa.com>> wrote: Hi, I know this is a little off topic but I wanted to raise an issue about handling questions in the mailing list (this is true both for the user mailing list and the dev but since there are other options such as stack overflow for user questions, this is more problematic in dev). Let’s say I ask a question (as I recently did). Unfortunately this was during spark summit in Europe so probably people were busy. In any case no one answered. The problem is, that if no one answers very soon, the question will almost certainly remain unanswered because new messages will simply drown it. This is a common issue not just for questions but for any comment or idea which is not immediately picked up. I believe we should have a method of handling this. Generally, I would say these types of things belong in stack overflow, after all, the way it is built is perfect for this. More seasoned spark contributors and committers can periodically check out unanswered questions and answer them. The problem is that stack overflow (as well as other targets such as the databricks forums) tend to have a more user based orientation. This means that any spark internal question will almost certainly remain unanswered. I was wondering if we could come up with a solution for this. Assaf. View this message in context: Handling questions in the mailing lists<http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html> Sent from the Apache Spark Developers List mailing list archive<http://apache-spark-developers-list.1001551.n3.nabble.com/> at Nabble.com.
Re: Handling questions in the mailing lists
I think that unfortunately mailing lists don't scale well. This one has thousands of subscribers with different interests and levels of experience. For any given person, most messages will be irrelevant. I also find that a lot of questions on user@ are not well-asked, aren't an SSCCE ( http://sscce.org/), not something most people are going to bother replying to even if they could answer. I almost entirely ignore user@ because there are higher-priority channels like PRs to deal with, that already have hundreds of messages per day. This is why little of it gets an answer -- too noisy. We have to have official mailing lists, in any event, to have some official channel for things like votes and announcements. It's not wrong to ask questions on user@ of course, but a lot of the questions I see could have been answered with research of existing docs or looking at the code. I think that given the scale of the list, it's not wrong to assert that this is sort of a prerequisite for asking thousands of people to answer one's question. But we can't enforce that. The situation will get better to the extent people ask better questions, help other people ask better questions, and answer good questions. I'd encourage anyone feeling this way to try to help along those dimensions. On Wed, Nov 2, 2016 at 11:32 AM assaf.mendelson <assaf.mendel...@rsa.com> wrote: > Hi, > > I know this is a little off topic but I wanted to raise an issue about > handling questions in the mailing list (this is true both for the user > mailing list and the dev but since there are other options such as stack > overflow for user questions, this is more problematic in dev). > > Let’s say I ask a question (as I recently did). Unfortunately this was > during spark summit in Europe so probably people were busy. In any case no > one answered. > > The problem is, that if no one answers very soon, the question will almost > certainly remain unanswered because new messages will simply drown it. > > > > This is a common issue not just for questions but for any comment or idea > which is not immediately picked up. > > > > I believe we should have a method of handling this. > > Generally, I would say these types of things belong in stack overflow, > after all, the way it is built is perfect for this. More seasoned spark > contributors and committers can periodically check out unanswered questions > and answer them. > > The problem is that stack overflow (as well as other targets such as the > databricks forums) tend to have a more user based orientation. This means > that any spark internal question will almost certainly remain unanswered. > > > > I was wondering if we could come up with a solution for this. > > > > Assaf. > > > > -- > View this message in context: Handling questions in the mailing lists > <http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html> > Sent from the Apache Spark Developers List mailing list archive > <http://apache-spark-developers-list.1001551.n3.nabble.com/> at > Nabble.com. >
Handling questions in the mailing lists
Hi, I know this is a little off topic but I wanted to raise an issue about handling questions in the mailing list (this is true both for the user mailing list and the dev but since there are other options such as stack overflow for user questions, this is more problematic in dev). Let's say I ask a question (as I recently did). Unfortunately this was during spark summit in Europe so probably people were busy. In any case no one answered. The problem is, that if no one answers very soon, the question will almost certainly remain unanswered because new messages will simply drown it. This is a common issue not just for questions but for any comment or idea which is not immediately picked up. I believe we should have a method of handling this. Generally, I would say these types of things belong in stack overflow, after all, the way it is built is perfect for this. More seasoned spark contributors and committers can periodically check out unanswered questions and answer them. The problem is that stack overflow (as well as other targets such as the databricks forums) tend to have a more user based orientation. This means that any spark internal question will almost certainly remain unanswered. I was wondering if we could come up with a solution for this. Assaf. -- View this message in context: http://apache-spark-developers-list.1001551.n3.nabble.com/Handling-questions-in-the-mailing-lists-tp19690.html Sent from the Apache Spark Developers List mailing list archive at Nabble.com.