Comments allowed might be helpful though :-)

--
Kevin A. McGrail
Asst. Treasurer & VP Fundraising, Apache Software Foundation
Chair Emeritus Apache SpamAssassin Project
https://www.linkedin.com/in/kmcgrail - 703.798.0171

On Wed, Mar 21, 2018 at 12:36 AM, Rajkiran Rajkumar <rajkiran2...@gmail.com>
wrote:

> @Saahil, kindly make your doc view-only for people with a link to it.
> Giving edit permissions to the world is a bad idea.
>
> Thanks,
> Rajkiran
>
> On Tue, Mar 20, 2018 at 5:17 PM, Kevin A. McGrail <kmcgr...@apache.org>
> wrote:
>
>> +users
>>
>> All we give is feedback.  The submission to GSoC is what matters.  So if
>> you mentioned perl here that's not going to carryover to the reviewers.
>>
>> Can someone with fresh eyes take a look at this?  I read it too recently
>> so I will gloss over it too much.
>>
>> Here are some posts the mentors list thought might be helpful.  The first
>> I believe covers someone's pov who did not get selected.
>>
>> https://medium.freecodecamp.org/hacking-gsoc-how-to-gain-rea
>> l-life-experience-and-support-open-source-b1e6a664f6e4?
>> source=linkShare-53ba2bb84284-1521381334
>>
>> https://sanatt.me/2017/12/30/cracking-google-summer-code-2018/
>>
>> Regards, KAM
>>
>> On Tue, Mar 20, 2018, 03:57 Saahil Sirowa <cs16btech11...@iith.ac.in>
>> wrote:
>>
>>> Hi Kevin and Apache SpamAssassin Dev Community,
>>>
>>> I have resolved all the changes you suggested in the previous draft.
>>> 1) I mentioned about learning PERL a week before the community bonding
>>> period. It will not take much time. I can assure you that language is not
>>> going to be an issue.
>>> 2) I updated the biography part a bit
>>> 3) Significant changes have been made in the Timeline.
>>> 4) I'm planning to used cmake/travis ci for automated testing. If there
>>> is a better alternative please do suggest.
>>> 5) I gave links to research papers that i will be reading in the
>>> timeline.
>>> 6) I updated the timeline by mentioning to gain advanced information
>>> about email traffic and spams. I listed some links for the purpose.
>>> 7) I updated the credits
>>> 8) There are other changes made in various parts of proposal.
>>>
>>> Thanks for your previous detailed feedback.
>>>
>>> Here is link to the updated proposal
>>> GSoC 2018 proposal
>>> <https://docs.google.com/document/d/1-OCNv79sHvVViKwnrRYtlMiKWLCzz4xUW4tNOlmaTmw/edit#heading=h.q7h3lddabdvh>
>>> Please rigorously review it and suggest any changes that I should make.
>>>
>>> Awaiting for a favorable response.
>>>
>>>
>>> Thanks...
>>> Saahil Sirowa
>>> B. Tech Computer Science and Engineering
>>> Indian Institute of Technology, Hyderabd
>>>
>>> On Mon, Mar 19, 2018 at 3:27 AM, Kevin A. McGrail <kmcgr...@apache.org>
>>> wrote:
>>>
>>>> Hi Saahil
>>>>
>>>> re: Perl. As the project is primarily in Perl and you do not list that
>>>> in your Proficiencies or any similar languages like PHP, I would address
>>>> that.  The word Perl does not appear a single time.
>>>>
>>>> Your Biography is a little light on why this is something you feel you
>>>> can implement.  The mentors will likely NOT be able to help you with the
>>>> science rather focusing on the community, processes, and open source in
>>>> general.
>>>>
>>>> re: Email and SPam, do you have any experience with email traffic or
>>>> spam?  if so, add it.  If not, explain what you plan to do to address that.
>>>>
>>>> Re: Deliverables, I think you'll need to propose the first draft of
>>>> that.  But your goal will likely be a plugin for Apache SpamAssassin that
>>>> can be installed and configured to provide multiple configurable
>>>> statistical analysis algorithms to better identify ham (good email) and/or
>>>> spam (bad email)
>>>>
>>>> Please use Apache SpamAssassin to properly brand the title.
>>>>
>>>> Re: I have no input on the scheduling/timelines except that past
>>>> proposal I have read have included more phases and do not add "optional"
>>>> items.  I'd prefer to see small increments to make sure you stay on
>>>> schedule and don't get overwhelmed and find yourself way behind as the time
>>>> progresses.
>>>>
>>>> Re: Testing Methodology, this is likely the most critical missing
>>>> part.  I am a fan of test driven development where you set up tests that
>>>> should pass and fall and use continuous testing as you add code to confirm
>>>> your development is progressing well.
>>>>
>>>> This is especially important because spam analysis often doesn't work
>>>> the way people expect and tests w/statistics can help identify issues.
>>>>
>>>> For example, this is a hypothesis that this statistical algorithms will
>>>> be better than Bayes.  So you'll need a baseline for comparison.
>>>>
>>>> Additionally, even experts in the field are surprised when they think
>>>> something will prove the hamminess of an email but in fact shows the
>>>> opposite.  Real world example, SPF is a policy when introduced was supposed
>>>> to allow an automated mechanism that says "this is an email from a
>>>> legitimate mail server for my domain".
>>>>
>>>> However, the FIRST wave of people to adobt it were all spammers.  So it
>>>> became a spam indicator more than a spam indicator.  It was a very
>>>> interesting outcome.
>>>>
>>>> Re: Corpora, you'll want a corpora of carefully hand sorted ham and
>>>> spam.  Have you thought about how you'll get that?  I *might* be able to
>>>> help but it's 50/50.
>>>>
>>>> Re: You mention reading research papers on statisical algorithms from a
>>>> previous proposal.  You'll want to list them to show which ones you plan to
>>>> study
>>>>
>>>> re: "Discussions with the SA community regarding the various types of
>>>> spams that the present SA can handle." is unclear.  What is a "type of
>>>> spam" to you?  Do you have a list of types of spam?
>>>>
>>>> re: "Brainstorming with the mentors and SA community about the various
>>>> input features and parameters that can have a huge impact on the overall
>>>> performance of the listed neural nets models." I think this is flawed.
>>>> There won't be a ton of people who can discuss this with you.  You'll need
>>>> to likely use scientific process to show what has a performance impact.
>>>> This is not busy work or school work.  This is an experiment that has not
>>>> been tried at the SA project.
>>>>
>>>> re: "actively involved with the community." is a stretch.  A few emails
>>>> do not active involvement make.
>>>>
>>>> re: Bonding, you might consider raising that to 1-2 major bugs and
>>>> 10-20 minor bugs.
>>>>
>>>> Re: Credits/references, I would add more clarity about where each of
>>>> those references are used.
>>>>
>>>> Regards,
>>>> KAM
>>>>
>>>
>>>
>

Reply via email to