Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-27 Thread David Udelson
Stephen commented on my proposal that instead of storing all downloads on
the server, we should only cache recent ones and build the others on
request, because many archives are only downloaded once when they are
ported to a different archiver. Since this is in contradiction to the
suggestion Aurelien made, I'm going to leave my proposal as-is for now.
Hopefully the link to this thread in my proposal will be a source of
up-to-date information on project constraints during the application review
process.

On Thu, Mar 26, 2015 at 11:15 AM, David Udelson d...@cornell.edu wrote:

  I'm already using the jobs infrastructure provided by the
  django-extensions package:
  http://django-extensions.readthedocs.org/en/latest/jobs_scheduling.html

 Cool. I didn't know about this extension, but it looks like it does what
 we need. So the background process would be its own file in the jobs
 directory, and we could leave it to the admin to setup the crontab?

  I have another test server with more current info if you want, but I
  break it regularly. It's lists-dev.cloud.fedoraproject.org

 Thanks for linking this. I got my own local dev server working yesterday,
 but this one is much more populated.

  We do put the attachment in the mbox, as a MIME component like in
  every email.

 I see how this works now. Are the attachments always Base64 encoded?

  Another possible nice-to-have feature I thought of yesterday is a
  download link that scripts can use to get archives (e.g.
  /download?year=xmonth=y). On the other hand, maybe this is just a
  security risk that has no actual use case, but I'd still like to have a
  second opinion on this.
 
  Well, there still is the authentication issue.

 I guess getting the scripts to authenticate would be a little complicated,
 but otherwise does this seem like something worth including? If my proposal
 gets accepted, I'm ok with leaving this as an open question until it
 becomes clear whether or not I'm going to have extra time at the end of the
 summer.

 Thanks,
 David




 On Thu, Mar 26, 2015 at 7:33 AM, Aurelien Bompard aurel...@bompard.org
 wrote:

  In my proposal I suggested using any of several asynchronous job queue
  libraries, such as Celery or Huey. These all use redis as a back-end.
  Because I have no experience with asynchronous job queues, I'm not sure
 if
  this is too much baggage for our purposes. Maybe we just don't want the
  extra dependencies.

 Yeah, we don't want to add another database or an AMQP server just for
 that. We must keep it simple for admins to deploy.

  Regarding cron jobs, there's also django-background-task which is a
 simple
  django addon that might do what we need. Again, if we don't want/need
 the
  extra dependency, rolling our own cron job should be fairly
  straight-forward.

 I'm already using the jobs infrastructure provided by the
 django-extensions package:
 http://django-extensions.readthedocs.org/en/latest/jobs_scheduling.html
 I did consider django-background-task but django-extensions seemed
 like a better fit, because django-background-task seems written for
 delayed tasks, not periodic tasks (well, a task could call itself
 again when done, but it seems like a hack). I'm not opposed to
 switching to django-background-task if we use the delayed job
 feature or if we need the extra flexibility of choosing exactly how
 many seconds apart we want our tasks to run.

  If we choose to pre-build the mbox files, we can't simply have them
  served through the webserver, because some lists are private
 
  Then there is also an authentication step?

 Yeah, we must use HyperKitty's authentication and check if the user is
 allowed to see the archive. So the files can't be served by the
 webserver like static files.

  I noticed on the test server
  that I can't actually look at any of the mailing lists because they're
 all
  private.

 If you're looking at lists.stg.fedoraproject.org, it's currently very
 outdated (still running the Python2-compatible branch of Mailman 3). I
 have another test server with more current info if you want, but I
 break it regularly. It's lists-dev.cloud.fedoraproject.org

  When we create the mbox file, do we simply note that an attachment
 existed
  (e.g. Attachment: myattachment.txt) or do we actually put the
 attachment
  in the mbox? AFAIK mbox is a plaintext format, so if the latter is the
 case
  then I'm not exactly sure how this would work...

 We do put the attachment in the mbox, as a MIME component like in
 every email. If you choose view source when looking at an email with
 attachments, you'll see how it's done.

  Are there going to be any issues handling unicode foreign characters or
  with file locks? Right now it looks like we should only have one process
  handling the mbox, but is it possible that more than one could be
 spawned
  somehow?

 No, mbox files are not designed for concurrent writes, so it's better
 to have a single process write to them.

  Another possible nice-to-have 

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-26 Thread Aurelien Bompard
 In my proposal I suggested using any of several asynchronous job queue
 libraries, such as Celery or Huey. These all use redis as a back-end.
 Because I have no experience with asynchronous job queues, I'm not sure if
 this is too much baggage for our purposes. Maybe we just don't want the
 extra dependencies.

Yeah, we don't want to add another database or an AMQP server just for
that. We must keep it simple for admins to deploy.

 Regarding cron jobs, there's also django-background-task which is a simple
 django addon that might do what we need. Again, if we don't want/need the
 extra dependency, rolling our own cron job should be fairly
 straight-forward.

I'm already using the jobs infrastructure provided by the
django-extensions package:
http://django-extensions.readthedocs.org/en/latest/jobs_scheduling.html
I did consider django-background-task but django-extensions seemed
like a better fit, because django-background-task seems written for
delayed tasks, not periodic tasks (well, a task could call itself
again when done, but it seems like a hack). I'm not opposed to
switching to django-background-task if we use the delayed job
feature or if we need the extra flexibility of choosing exactly how
many seconds apart we want our tasks to run.

 If we choose to pre-build the mbox files, we can't simply have them
 served through the webserver, because some lists are private

 Then there is also an authentication step?

Yeah, we must use HyperKitty's authentication and check if the user is
allowed to see the archive. So the files can't be served by the
webserver like static files.

 I noticed on the test server
 that I can't actually look at any of the mailing lists because they're all
 private.

If you're looking at lists.stg.fedoraproject.org, it's currently very
outdated (still running the Python2-compatible branch of Mailman 3). I
have another test server with more current info if you want, but I
break it regularly. It's lists-dev.cloud.fedoraproject.org

 When we create the mbox file, do we simply note that an attachment existed
 (e.g. Attachment: myattachment.txt) or do we actually put the attachment
 in the mbox? AFAIK mbox is a plaintext format, so if the latter is the case
 then I'm not exactly sure how this would work...

We do put the attachment in the mbox, as a MIME component like in
every email. If you choose view source when looking at an email with
attachments, you'll see how it's done.

 Are there going to be any issues handling unicode foreign characters or
 with file locks? Right now it looks like we should only have one process
 handling the mbox, but is it possible that more than one could be spawned
 somehow?

No, mbox files are not designed for concurrent writes, so it's better
to have a single process write to them.

 Another possible nice-to-have feature I thought of yesterday is a
 download link that scripts can use to get archives (e.g.
 /download?year=xmonth=y). On the other hand, maybe this is just a
 security risk that has no actual use case, but I'd still like to have a
 second opinion on this.

Well, there still is the authentication issue.

Aurélien
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-26 Thread David Udelson
 I'm already using the jobs infrastructure provided by the
 django-extensions package:
 http://django-extensions.readthedocs.org/en/latest/jobs_scheduling.html

Cool. I didn't know about this extension, but it looks like it does what we
need. So the background process would be its own file in the jobs
directory, and we could leave it to the admin to setup the crontab?

 I have another test server with more current info if you want, but I
 break it regularly. It's lists-dev.cloud.fedoraproject.org

Thanks for linking this. I got my own local dev server working yesterday,
but this one is much more populated.

 We do put the attachment in the mbox, as a MIME component like in
 every email.

I see how this works now. Are the attachments always Base64 encoded?

 Another possible nice-to-have feature I thought of yesterday is a
 download link that scripts can use to get archives (e.g.
 /download?year=xmonth=y). On the other hand, maybe this is just a
 security risk that has no actual use case, but I'd still like to have a
 second opinion on this.

 Well, there still is the authentication issue.

I guess getting the scripts to authenticate would be a little complicated,
but otherwise does this seem like something worth including? If my proposal
gets accepted, I'm ok with leaving this as an open question until it
becomes clear whether or not I'm going to have extra time at the end of the
summer.

Thanks,
David




On Thu, Mar 26, 2015 at 7:33 AM, Aurelien Bompard aurel...@bompard.org
wrote:

  In my proposal I suggested using any of several asynchronous job queue
  libraries, such as Celery or Huey. These all use redis as a back-end.
  Because I have no experience with asynchronous job queues, I'm not sure
 if
  this is too much baggage for our purposes. Maybe we just don't want the
  extra dependencies.

 Yeah, we don't want to add another database or an AMQP server just for
 that. We must keep it simple for admins to deploy.

  Regarding cron jobs, there's also django-background-task which is a
 simple
  django addon that might do what we need. Again, if we don't want/need the
  extra dependency, rolling our own cron job should be fairly
  straight-forward.

 I'm already using the jobs infrastructure provided by the
 django-extensions package:
 http://django-extensions.readthedocs.org/en/latest/jobs_scheduling.html
 I did consider django-background-task but django-extensions seemed
 like a better fit, because django-background-task seems written for
 delayed tasks, not periodic tasks (well, a task could call itself
 again when done, but it seems like a hack). I'm not opposed to
 switching to django-background-task if we use the delayed job
 feature or if we need the extra flexibility of choosing exactly how
 many seconds apart we want our tasks to run.

  If we choose to pre-build the mbox files, we can't simply have them
  served through the webserver, because some lists are private
 
  Then there is also an authentication step?

 Yeah, we must use HyperKitty's authentication and check if the user is
 allowed to see the archive. So the files can't be served by the
 webserver like static files.

  I noticed on the test server
  that I can't actually look at any of the mailing lists because they're
 all
  private.

 If you're looking at lists.stg.fedoraproject.org, it's currently very
 outdated (still running the Python2-compatible branch of Mailman 3). I
 have another test server with more current info if you want, but I
 break it regularly. It's lists-dev.cloud.fedoraproject.org

  When we create the mbox file, do we simply note that an attachment
 existed
  (e.g. Attachment: myattachment.txt) or do we actually put the
 attachment
  in the mbox? AFAIK mbox is a plaintext format, so if the latter is the
 case
  then I'm not exactly sure how this would work...

 We do put the attachment in the mbox, as a MIME component like in
 every email. If you choose view source when looking at an email with
 attachments, you'll see how it's done.

  Are there going to be any issues handling unicode foreign characters or
  with file locks? Right now it looks like we should only have one process
  handling the mbox, but is it possible that more than one could be spawned
  somehow?

 No, mbox files are not designed for concurrent writes, so it's better
 to have a single process write to them.

  Another possible nice-to-have feature I thought of yesterday is a
  download link that scripts can use to get archives (e.g.
  /download?year=xmonth=y). On the other hand, maybe this is just a
  security risk that has no actual use case, but I'd still like to have a
  second opinion on this.

 Well, there still is the authentication issue.

 Aurélien
 ___
 Mailman-Developers mailing list
 Mailman-Developers@python.org
 https://mail.python.org/mailman/listinfo/mailman-developers
 Mailman FAQ: http://wiki.list.org/x/AgA3
 Searchable Archives:
 

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-25 Thread Aurelien Bompard
Hey David, here are my thoughs on the challenges:

 1) Determine which messages to include in the mbox.
 An entire list archive is clearly one choice, but is there also
 interest in generating mbox files for specific threads, list archieves
 between specific dates, etc.?

Hmm, depending on the architecture we choose, we may not have a lot of
options. I'd like to see at least whole-list and last 30 days
archives though, this last one being useful to those who want to use
their mail client and seed it with the latest discussion to reply
in-thread.

 2) For each message, append plaintext to mbox file.
 Is this the part where we risk blocking the UI? Certainly for
 hundreds of thousands of messages, this will be a computationally intensive
 step, so will this have to be run in a separate thread?

Yeah, with a lot of messages, and with possible attachments, we may be
creating hundred of megabytes or maybe gigabytes of data. This has to
be done outside of the webserver process, so we'll need something like
a task queue and a daemon process or a cron job. Or we could be
building and appending to the mbox files when new messages arrive,
which would take up more disk space but would be more fluid from a UI
point of view. It would also probably be much more resource-intensive
than a cron job, because the mbox files will be large and should be
gzipped, so it would be better to append a batch of emails than
opening and closing on each incoming email.
I'm leaning towards pre-rendering the mbox files in a regular cron job
and warning the user in the UI that the archive contains all email up
to the last hour, for example.
We can't use the prototype archiver because we need to filter the
messages content and escape email adresses to protect from spam
harvesters, like MM2.1 currently does.

 3) Present mbox file to user for download.
 I'm hoping this is a trivial step, but I'm not sure about some of the
 specifics. For example, is Hyperkitty only able to run on apache, or is the
 choice of web server entirely up to the web admin? How we ultimately serve
 the file will depend on these details.

HyperKitty runs on Django, which can be served by whichever
WSGI-compliant server the admin chooses (Apache's mod_wsgi, uWSGI,
gunicorn, etc.). If we choose to pre-build the mbox files, we can't
simply have them served through the webserver, because some lists are
private (only available to subscribers).

I hope that clearifies a bit.

Aurélien
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-25 Thread David Udelson
Thanks for your feedback Aurelien.

 we'll need something like a task queue and a daemon process or a cron job

In my proposal I suggested using any of several asynchronous job queue
libraries, such as Celery or Huey. These all use redis as a back-end.
Because I have no experience with asynchronous job queues, I'm not sure if
this is too much baggage for our purposes. Maybe we just don't want the
extra dependencies. Regarding cron jobs, there's also django-background-task
https://github.com/lilspikey/django-background-task, which is a simple
django addon that might do what we need. Again, if we don't want/need the
extra dependency, rolling our own cron job should be fairly
straight-forward.

 If we choose to pre-build the mbox files, we can't simply have them
served through the webserver, because some lists are private

Then there is also an authentication step? I noticed on the test server
that I can't actually look at any of the mailing lists because they're all
private. So we should be able to use pre-existing code for this step?

 with possible attachments, we may be creating hundred of megabytes or
maybe gigabytes of data

When we create the mbox file, do we simply note that an attachment existed
(e.g. Attachment: myattachment.txt) or do we actually put the attachment
in the mbox? AFAIK mbox is a plaintext format, so if the latter is the case
then I'm not exactly sure how this would work...

Are there going to be any issues handling unicode foreign characters or
with file locks? Right now it looks like we should only have one process
handling the mbox, but is it possible that more than one could be spawned
somehow?

Another possible nice-to-have feature I thought of yesterday is a
download link that scripts can use to get archives (e.g.
/download?year=xmonth=y). On the other hand, maybe this is just a
security risk that has no actual use case, but I'd still like to have a
second opinion on this.

Additionally, here are some tentative weekly goals I have for the project.
Feedback on the order/plausibility of these would be awesome!

Week 1)  Given an email message, the message headers and body are extracted
and stored in a local file in mbox format. All unit tests passing.
Week 2)  Attachments are represented in the mbox file as well. Email
addresses are escaped. There are no encoding errors (no boxes or ?s). All
unit tests passing.
Week 3)  Explore options for possible asynchronous queue managers.
Weeks 4-5) When a mailing archive is created, a background process
(implemented using chosen backend) is attached to it for managing its mbox
files. Existing processes are started when the server starts, and the
server can efficiently manage all of these (possibly tens/hundreds?) of
tasks. All unit tests passing.
Week 6) Clean code and tests before midterm review. All unit tests passing.
Week 7-8)  Each background process unzips two mbox files, one for the
entire list and one for the past month, adds any messages that have come in
in the past hour (in mbox format) and rezips the archive. All unit tests
passing.
Week 9-10)  Mbox archives are served by hyperkitty upon request. Hyperkitty
does not at this point authenticate users. All unit tests passing.
Week 11) Hyperkitty authenticates the user before serving the mbox request.
If the request is denied, the user is notified via the UI. All unit tests
passing.
Week 12) Code review and cleaning, final check on unit tests (they should
all be passing).

Thanks,
David

On Wed, Mar 25, 2015 at 4:18 AM, Aurelien Bompard aurel...@bompard.org
wrote:

 Hey David, here are my thoughs on the challenges:

  1) Determine which messages to include in the mbox.
  An entire list archive is clearly one choice, but is there also
  interest in generating mbox files for specific threads, list archieves
  between specific dates, etc.?

 Hmm, depending on the architecture we choose, we may not have a lot of
 options. I'd like to see at least whole-list and last 30 days
 archives though, this last one being useful to those who want to use
 their mail client and seed it with the latest discussion to reply
 in-thread.

  2) For each message, append plaintext to mbox file.
  Is this the part where we risk blocking the UI? Certainly for
  hundreds of thousands of messages, this will be a computationally
 intensive
  step, so will this have to be run in a separate thread?

 Yeah, with a lot of messages, and with possible attachments, we may be
 creating hundred of megabytes or maybe gigabytes of data. This has to
 be done outside of the webserver process, so we'll need something like
 a task queue and a daemon process or a cron job. Or we could be
 building and appending to the mbox files when new messages arrive,
 which would take up more disk space but would be more fluid from a UI
 point of view. It would also probably be much more resource-intensive
 than a cron job, because the mbox files will be large and should be
 gzipped, so it would be better to append a batch of emails 

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-25 Thread Stephen J. Turnbull
David Udelson writes:

  Thanks for your feedback Aurelien.
  
   we'll need something like a task queue and a daemon process or a cron job
  
  In my proposal I suggested using any of several asynchronous job queue
  libraries, such as Celery or Huey.  [...]

So far, this is good discussion for the list.  IMO, you want to link
to the archives for this thread in your proposal on Melange.

  Additionally, here are some tentative weekly goals I have for the
  project.  Feedback on the order/plausibility of these would be
  awesome!

IMHO, this part should just have been directly edited into the
proposal on Melange with a remark here that you did it.  Most of the
people on this list are either uninterested or not competent to
comment on the list of goals.  Alternatively, do both but it should
definitely be in your proposal already.

___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-25 Thread David Udelson
My proposal has been updated. I apologize if I breached mailing-list
ediquette, I'll get the hang of this eventually :)

On Wed, Mar 25, 2015 at 10:14 PM, Stephen J. Turnbull 
turnb...@sk.tsukuba.ac.jp wrote:

 David Udelson writes:

   Thanks for your feedback Aurelien.
  
we'll need something like a task queue and a daemon process or a cron
 job
  
   In my proposal I suggested using any of several asynchronous job queue
   libraries, such as Celery or Huey.  [...]

 So far, this is good discussion for the list.  IMO, you want to link
 to the archives for this thread in your proposal on Melange.

   Additionally, here are some tentative weekly goals I have for the
   project.  Feedback on the order/plausibility of these would be
   awesome!

 IMHO, this part should just have been directly edited into the
 proposal on Melange with a remark here that you did it.  Most of the
 people on this list are either uninterested or not competent to
 comment on the list of goals.  Alternatively, do both but it should
 definitely be in your proposal already.


___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-25 Thread Stephen J. Turnbull
David Udelson writes:
  My proposal has been updated. I apologize if I breached mailing-list
  ediquette, I'll get the hang of this eventually :)

No etiquette problem.  Those are my personal views about effective use
of the mailing list and Melange, respectively.  I suspect from past
experience my views reflect those of other mentors and mailman
developers pretty well, but if they don't, I'm sure they'll speak up. :-)

There's no doubt about updating Melange from your point of view: we
mentors are human, we get a ton of mail (Melange pings us on every
change to any proposal in our org, and several of us are org admins
increasing the mail signficiantly), and mail is easy to drop on the
floor unless you deal with it completely immediately.  Unfortunately,
we do that.

OTOH, in theory you proposal document on Melange can always reflect
your best efforts in one place, and we won't miss it. ;-)

___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-24 Thread Aurelien Bompard
 I'm interested in contributing to the Hyperkitty archiver. Specifically,
 it looks like some requested features for Hyperkitty include rss syndication
 for entire mailing lists/specific users/specific threads, and the ability
 to view entire threads as plaintext and download that plaintext.

We don't have either feature yet, but I think RSS syndication would be
much too short for a GSOC project.
There's always been demand for a way to download a list archive as an
mbox file, and it's reasonably complicated if you want to avoid
blocking the UI, so this may be a good project similar to the
plaintext threads feature you mention.

Aurélien
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-24 Thread David Udelson
I have submitted a proposal on Google Melange. Any feedback I could get
about my project constraints would be great!

Thanks,
David

On Tue, Mar 24, 2015 at 8:23 AM, Aurelien Bompard aurel...@bompard.org
wrote:

  I'm interested in contributing to the Hyperkitty archiver. Specifically,
  it looks like some requested features for Hyperkitty include rss
 syndication
  for entire mailing lists/specific users/specific threads, and the
 ability
  to view entire threads as plaintext and download that plaintext.

 We don't have either feature yet, but I think RSS syndication would be
 much too short for a GSOC project.
 There's always been demand for a way to download a list archive as an
 mbox file, and it's reasonably complicated if you want to avoid
 blocking the UI, so this may be a good project similar to the
 plaintext threads feature you mention.

 Aurélien
 ___
 Mailman-Developers mailing list
 Mailman-Developers@python.org
 https://mail.python.org/mailman/listinfo/mailman-developers
 Mailman FAQ: http://wiki.list.org/x/AgA3
 Searchable Archives:
 http://www.mail-archive.com/mailman-developers%40python.org/
 Unsubscribe:
 https://mail.python.org/mailman/options/mailman-developers/dru5%40cornell.edu

 Security Policy: http://wiki.list.org/x/QIA9

___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-24 Thread David Udelson
Sent this email to Aurélien's personal email by mistake the first time
(sorry Aurélien!). I'll get used to the mailing list evenually

 There's always been demand for a way to download a list archive as an
mbox file

This is a feature I am interested in pursuing. From a very (very) high
level, the process of creating an mbox file out a list archive should
entail:

1) Determine which messages to include in the mbox.
An entire list archive is clearly one choice, but is there also
interest in generating mbox files for specific threads, list archieves
between specific dates, etc.?
2) For each message, append plaintext to mbox file.
Is this the part where we risk blocking the UI? Certainly for
hundreds of thousands of messages, this will be a computationally intensive
step, so will this have to be run in a separate thread?
3) Present mbox file to user for download.
I'm hoping this is a trivial step, but I'm not sure about some of the
specifics. For example, is Hyperkitty only able to run on apache, or is the
choice of web server entirely up to the web admin? How we ultimately serve
the file will depend on these details.

Aurélien, just because I'm new to the codebase, what are the challenges you
see in implementing this feature? Whatever details you can give me will
help me set my milestones.

Thanks,
David

On Tue, Mar 24, 2015 at 8:23 AM, Aurelien Bompard aurel...@bompard.org
wrote:

  I'm interested in contributing to the Hyperkitty archiver. Specifically,
  it looks like some requested features for Hyperkitty include rss
 syndication
  for entire mailing lists/specific users/specific threads, and the
 ability
  to view entire threads as plaintext and download that plaintext.

 We don't have either feature yet, but I think RSS syndication would be
 much too short for a GSOC project.
 There's always been demand for a way to download a list archive as an
 mbox file, and it's reasonably complicated if you want to avoid
 blocking the UI, so this may be a good project similar to the
 plaintext threads feature you mention.

 Aurélien
 ___
 Mailman-Developers mailing list
 Mailman-Developers@python.org
 https://mail.python.org/mailman/listinfo/mailman-developers
 Mailman FAQ: http://wiki.list.org/x/AgA3
 Searchable Archives:
 http://www.mail-archive.com/mailman-developers%40python.org/
 Unsubscribe:
 https://mail.python.org/mailman/options/mailman-developers/dru5%40cornell.edu

 Security Policy: http://wiki.list.org/x/QIA9

___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9

Re: [Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-23 Thread Terri Oda

On 2015-03-22 10:23 PM, David Udelson wrote:

I'm interested in contributing to the Hyperkitty archiver. Specifically, it
looks like some requested features for Hyperkitty include rss syndication
for entire mailing lists/specific users/specific threads, and the ability
to view entire threads as plaintext and download that plaintext. I have the
following questions regarding these features:


I talked with David on IRC, but in case anyone else is wondering:

RSS feeds was a project a few years ago:
http://www.google-melange.com/gsoc/project/details/google/gsoc2013/joanna/5831844033462272

I don't know offhand if the code was merged into upstream, but my 
preference is as always to make use of code we have if we can, so RSS 
feeds are maybe not going to make for a good hyperkitty project this year.



 Terri



___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9


[Mailman-Developers] GSoC 15 - Interested in contributing to Hyperkitty

2015-03-22 Thread David Udelson
Hello,
 My name is David Udelson. I am a freshman Computer Science
undergraduate at Cornell University. I have 3 years of experience with
python under my belt, but I've never contributed to an open source project
because I've never known where to begin. I'm hoping GSoC will provide me
with that starting point, as well as an opportunity to make some meaningful
contributions to the open source community.

I'm interested in contributing to the Hyperkitty archiver. Specifically, it
looks like some requested features for Hyperkitty include rss syndication
for entire mailing lists/specific users/specific threads, and the ability
to view entire threads as plaintext and download that plaintext. I have the
following questions regarding these features:

1) Are these features definitely expected to be in future releases, or are
they just nice-to-haves? If there are more important/pressing issues that
the Hyperkitty devs think I could implement instead, I would prefer to work
on those.
2) Are both of these features implementable within the GSoC timeline, or
should I only pursue one of them?

Thanks,
David
___
Mailman-Developers mailing list
Mailman-Developers@python.org
https://mail.python.org/mailman/listinfo/mailman-developers
Mailman FAQ: http://wiki.list.org/x/AgA3
Searchable Archives: 
http://www.mail-archive.com/mailman-developers%40python.org/
Unsubscribe: 
https://mail.python.org/mailman/options/mailman-developers/archive%40jab.org

Security Policy: http://wiki.list.org/x/QIA9