Re: [Reproducible-builds] [GSoC 2016] : Application review

2016-03-23 Thread Jérémy Bobbio
Satyam Zode:
> >> Official coding period.
> >> 3) Week 1 - 2 (May 27 - June 9):
> >>   - Work on  "Allow users to ignore arbitrary differences" part.
> >>   - Work simultaneously on unreproducible packages.
> >
> > How much time are you going to give to the community so they can review
> > your proposed user interfaces?
> >
> I think, I will be ready with a design of above by 1st May. After that
> till 10th May we can discuss user interfaces because from 11th May I
> will have exams so I won't be available for active discussions. If
> some things will be remained to discuss then we can always discuss
> alongside during a coding period.

Mh… I was not expecting you to work during the community bounding
period, but it's your call.

> >> 4) Week 3 - 4 (June 10 - June 22):
> >>   - Work on Parallel processing part.
> >>   - Work simultaneously on unreproducible packages.
> >
> > This is unlikely to work. Implementing parallel processing requires
> > deep focus because it's also about adding missing locks and
> > understanding subtle concurrency issues.
> >
> > How much experience do you have with concurrent programming?
> I have good experience with concurrent programming. I have written
> many concurrent programs in golang and I believe it'll help me here.
>
> > I think you underevaluate how hard this is to get right. To the very
> > least you shoud be entirely focused on this and not fixing packages at
> > the same time.
> I understand that this is not going to be a piece of cake for me.

I believe it applies to you as well as anyone else. I've been working
myself on and off on that code for six months without getting it to a
point where it was stable enough to be usable! It's reassuring that you
have previous experience in concurrent programing.

> However, If we remove fixing of packages from this schedule then I
> will get enough time to concentrate on this particular problem.

Alright. :)

> In my opinion, there must a buffer time in software development
> process for any unexpected incidence. Hence, I am planning to keep
> this time as a buffer time. What do you think about it ? (Of course, I
> will devote this time for community work only).

If you feel you need buffer time, then let's call it that way. :)
I'd rather have it stated as such than promising stuff you should be
doing along the way.

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] [GSoC 2016] : Application review

2016-03-22 Thread Jérémy Bobbio
Satyam Zode:
> As far as my research till now is concerned. Brief timeline looks like:
> Design and Experiment:
> 1)  During the application screening:(March 26 - April 22)
>  1.1) Acquaint myself with diffoscope and research about proposed features.
>  1.2) Get hands-on experience with diffoscope.
>  1.3) Set up the development environment.
>  1.4) Track changes to the project roadmap in a publicly accessible document.
>  1.5) Design relevant project design and discuss project design with a
> community.
> 2) Community Bonding Period: (April 23 - May 10)
>   2.1) Interact with the community and exchange information related to
> project design and working of diffoscope in different conditions.
>   2.2) Finalizing design and documenting same in the project design wiki.
>   2.3) Learning more about Debian community.

During that period I think it would be worthwhile to review packages and
if there's one you see an easy fix, submit patches. That way you would
get better insights on the various issues and diffoscope limitations.

> Implementation:
> Official coding period.
> 3) Week 1 - 2 (May 27 - June 9):
>   - Work on  "Allow users to ignore arbitrary differences" part.
>   - Work simultaneously on unreproducible packages.

How much time are you going to give to the community so they can review
your proposed user interfaces?

> 4) Week 3 - 4 (June 10 - June 22):
>   - Work on Parallel processing part.
>   - Work simultaneously on unreproducible packages.

This is unlikely to work. Implementing parallel processing requires
deep focus because it's also about adding missing locks and
understanding subtle concurrency issues.

How much experience do you have with concurrent programming?
I think you underevaluate how hard this is to get right. To the very
least you shoud be entirely focused on this and not fixing packages at
the same time.

> --- Mid-Term Evaluations
> 
> 
> 5) Week 5 - 7 (June 23 - July 13):
>- Finish remaining work
>- Start working on fuzzy matching algorithm.
> 6) Week 8 - 10 (July 14 - August 3):
>   - Finish fuzzy matching algorithm implementation.
>   - Work on new file-format comparators.

diffosope already supports fuzzy matching via TLSH. It's implemented and
works nicely. But it only does inside a container. That means it will
not notice when you compare foo.gz and foo.xz that foo might actually be
the same file. Three weeks for that feels like too much.

>  7) Week 11 (August 4 - August 13):
>   - Write tests for implemented features and comparators.

Big no here. Tests should be written prior or during the development of
the various features. While the code coverage has never been 100%, at
least the basics should be covered. So please refine the timeline by
making enough room to write tests during the development.

>   - keep working on unreproducible Debian packages.
> 
> Documentation:
>  8)  Week 12 (August 15 - August 22): Suggested pencils down date
>   - Code refactoring.
>   - Finish documentation.

What kind of code refactoring are you thinking about?

What kind of documentation are you thinking about? Like tests, user
documentation should be written at the same time or maybe prior as the
actual features.


Sorry if this starts to feel annoying, but I'd like to avoid us making
mistakes that I've seen several times in the past with other GSoC.

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] [GSoC 2016] : Application review

2016-03-21 Thread Satyam Zode
Hi all!

Jérémy Bobbio:
> Satyam Zode:
>> So this summer I intend to work on
>> 1) Improvements to diffoscope:
>> 1.1)  Allow users to ignore arbitrary differences (Addition of
>> ignore-profiles flag).
>> 1.2)  Perform fuzzy-matching across archives.
>> 1.3)  Finish parallel processing part.
>> Above points are mentioned on GSoC wiki. And also there are more
>> features mentioned in whishlist
>> (https://reproducible-builds.org/events/athens2015/diffoscope-wishlist/)
>> I will try to cover some of those too.
>
> Sounds good. Be aware that the first part will require some design work.
> Finding the right UI regarding ignores might require input from the
> community. I'd recommend to split this into two parts
> (design+experiment+survey / implementation+documentation). You probably
> will want to work on other things in parellel with the discussions.
>
I have started thinking on design will soon let you all know. It'd be
great if you show me some directions towards designing the proposed
features.
(I will create new thread for this).
> Could you try to come up with rough estimations on how much time all of
> the above would require?
>
Yes! But first, I need some time for that (maybe upcoming 24 hours)
because I want to think more about features proposed above(I don't
know how much complex those features could be!).

Moving to the interesting part :)
> Fixing #818856 should not be too hard. If you could submit a patch that
> would make more confident that you could do all the above.
>
Thank you so much for giving me this task :-) . Now I know how
diffoscope can be tested and build from code. (Before this I was only
familiar with codebase and diffoscope functionalities )

I have fixed this issue. I also have fixed link in the documentation.
Please find an attachment.
Here is the output which I have got:

satyam@satyamz:~/Debian/experiment/diffoscope/bin$ mkdir foo bar
satyam@satyamz:~/Debian/experiment/diffoscope/bin$ touch foo/baz
satyam@satyamz:~/Debian/experiment/diffoscope/bin$ ln -s somefile bar/baz
satyam@satyamz:~/Debian/experiment/diffoscope/bin$ ./diffoscope foo bar
--- foo
+++ bar
├── stat {}
│ @@ -1,8 +1,8 @@
│
│Size: 4096   Blocks: 8  IO Block: 4096   directory
│   Links: 2
│  Access: (0755/drwxr-xr-x)  Uid: ( 1000/  satyam)   Gid: ( 1000/  satyam)
│
│ -Modify: 2016-03-21 17:22:54.004395232 +
│ +Modify: 2016-03-21 17:23:20.516394537 +
│
│   Birth: -
│   --- foo/baz
├── +++ bar/baz
│ @@ -0,0 +1,2 @@
│ +000: 6465 7374 696e 6174 696f 6e3a 2073 6f6d  destination: som
│ +010: 6566 696c 650a   efile.
│   ├── stat {}
│   │ @@ -1,8 +1,8 @@
│   │
│   │ -  Size: 0 Blocks: 0  IO Block: 4096   regular empty file
│   │ +  Size: 8 Blocks: 0  IO Block: 4096   symbolic link
│   │   Links: 1
│   │ -Access: (0644/-rw-r--r--)  Uid: ( 1000/  satyam)   Gid: ( 1000/  satyam)
│   │ +Access: (0777/lrwxrwxrwx)  Uid: ( 1000/  satyam)   Gid: ( 1000/  satyam)
│   │
│   │ -Modify: 2016-03-21 17:22:54.004395232 +
│   │ +Modify: 2016-03-21 17:23:20.516394537 +
│   │
│   │   Birth: -
│   ╵
╵
satyam@satyamz:~/Debian/experiment/diffoscope/bin$



> Most of the improvements we could think of have indeed been implemented
> since. :)
Nice :)
I also want to know, whether I will be able to edit application or not
after a deadline?


Thanks again!
Satyam Zode
From 3e9aea18767099dffe62c14e7215aed54347a10f Mon Sep 17 00:00:00 2001
From: Satyam Zode 
Date: Mon, 21 Mar 2016 23:12:55 +0530
Subject: [PATCH 1/2] fixed issue related to diffoscope symlinks crashing

---
 diffoscope/comparators/binary.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/diffoscope/comparators/binary.py b/diffoscope/comparators/binary.py
index 9663214..5622a9c 100644
--- a/diffoscope/comparators/binary.py
+++ b/diffoscope/comparators/binary.py
@@ -183,7 +183,7 @@ class File(object, metaclass=ABCMeta):
 logger.debug('%s has_same_content %s', self, other)
 # try comparing small files directly first
 my_size = os.path.getsize(self.path)
-other_size = os.path.getsize(other.path)
+other_size = os.lstat(other.path).st_size
 if my_size == other_size and my_size <= SMALL_FILE_THRESHOLD:
 if open(self.path, 'rb').read() == open(other.path, 'rb').read():
 return True
-- 
2.1.4

From 79809c35a402f1e28f1c3f7c94985274172c0415 Mon Sep 17 00:00:00 2001
From: Satyam Zode 
Date: Mon, 21 Mar 2016 23:25:27 +0530
Subject: [PATCH 2/2] Fixed link in documentation

---
 README.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/README.rst b/README.rst
index fe23a37..5288e52 100644
--- a/README.rst
+++ b/README.rst
@@ -83,7 +83,7 @@ system against the diffoscope package:
 Join the users and developers mailing-list:
 
 
-diffoscope website is at 
+diffoscope website is at 

Re: [Reproducible-builds] [GSoC 2016] : Application review

2016-03-21 Thread Jérémy Bobbio
Satyam Zode:
> So this summer I intend to work on
> 1) Improvements to diffoscope:
> 1.1)  Allow users to ignore arbitrary differences (Addition of
> ignore-profiles flag).
> 1.2)  Perform fuzzy-matching across archives.
> 1.3)  Finish parallel processing part.
> Above points are mentioned on GSoC wiki. And also there are more
> features mentioned in whishlist
> (https://reproducible-builds.org/events/athens2015/diffoscope-wishlist/)
> I will try to cover some of those too.

Sounds good. Be aware that the first part will require some design work.
Finding the right UI regarding ignores might require input from the
community. I'd recommend to split this into two parts
(design+experiment+survey / implementation+documentation). You probably
will want to work on other things in parellel with the discussions.

Could you try to come up with rough estimations on how much time all of
the above would require?

Fixing #818856 should not be too hard. If you could submit a patch that
would make more confident that you could do all the above.

> I guess Better/smarter ELF diffing is underdevelopment (I have checked
> git logs and diffoscope for same)

Most of the improvements we could think of have indeed been implemented
since. :)

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] [GSoC 2016] : Application review

2016-03-21 Thread Satyam Zode
Hi, everyone!

 Jérémy Bobbio:
> Thanks for your application. I much appreciated that it's done before
> the deadline. I also like you being clear of your other commitments.
>
> I think it would have been fine application for last summer, but we've
> made significant progress on several fronts since, therefore I'm not
> convinced that there's much work left in the tasks your propose. I think
> it would be better to aim with more precise tasks, e.g. toolchain
> software you'll be improving to solve classes of issues, or at leas an
> outline the features you intend to add to strip-nondeterminism or
> diffoscope.
>
> (If you feel you're part of the reproducible builds team and disagree
> with my comments, please say so!)

Thanks Lunar for this valuable feedback. Yes, I am agree with you.
After reading the reasons( which you mentioned) and  as a part of
reproducible builds team I don't think the proposed work(don't need
whole summer to work on) by me will help much to reproducible builds
effort too. But I think there are some issues which still needs to be
fixed. There are some issues in which not even a single package have
patch. I will try to look into those and will try to search for
solutions.

So this summer I intend to work on
1) Improvements to diffoscope:
1.1)  Allow users to ignore arbitrary differences (Addition of
ignore-profiles flag).
1.2)  Perform fuzzy-matching across archives.
1.3)  Finish parallel processing part.
Above points are mentioned on GSoC wiki. And also there are more
features mentioned in whishlist
(https://reproducible-builds.org/events/athens2015/diffoscope-wishlist/)
I will try to cover some of those too.
I guess Better/smarter ELF diffing is underdevelopment (I have checked
git logs and diffoscope for same)

2) Improving reproducibility of Debian packages:
In this section I will be fixing Debian packages and will try to find
the solutions to the issues which do not have solution yet. I am
trying to enlist such issues.


> If you look at packages identified as leaving timestamps in gzip
> headers, you'll see that most of them already have patches, and the ones
> who don't are affected with other issues
> https://tests.reproducible-builds.org/issues/unstable/timestamps_in_gzip_headers_issue.html
> These other issues probably deter maintainers' motivation to fix the
> problems with gzip timestamps.
>
> Almost all packages with varying mtimes in data.tar or control.tar have
> patches or have been fixed through toolchain improvements:
> https://tests.reproducible-builds.org/issues/unstable/varying_mtimes_in_data_tar_gz_or_control_tar_gz_issue.html
>
> It feels quite suboptimal to highlight user and groups in tarballs as
> separate issues as I think all are affected by other tarball related
> issues. They should be fixed at the same time:
> https://tests.reproducible-builds.org/issues/unstable/users_and_groups_in_tarball_issue.html
>
> Regarding timestamps due to C pre-processor macros, Dhole is waiting
> for GCC patch window to open again—which will be in April, IIRC.
> So unless you intend to work on adding support for SOURCE_DATE_EPOCH in
> clang, I'm not sure there's much work left on this issue. I believe that
> fixing the 400+ packages individually should not be undertaken if
> we can avoid it.
> https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01402.html
> https://wiki.debian.org/ReproducibleBuilds/TimestampsFromCPPMacros
>
> Emmanuel Bourg has been working and fixing almost all Java-related
> issues in the course of the past year. I expect he'll probably work on
> this fixing locale related javadoc issue in a near future. I guess you
> could coordinate with him to write the necessary patches, though.
> https://tests.reproducible-builds.org/issues/unstable/locale_in_documentation_generated_by_javadoc_issue.html
>

A big thanks to you, because I really didn't know about many of the
above things. Its good to know that people are already working on this
part. :-)

> These quick evaluations leave me the feeling that your proposed schedule
> is currently not adequate with actual needs of the reproducible builds
> effort.
>
> This probably means that progress can be made on making more visible
> areas that actually require work…

Please let me know what you think about the work which I have proposed
now. I will frame timeline accordingly.

PS: I really feel that I am part of reproducible builds and I want to
strengthen the bond by spending my summer working with reproducible
builds ;-)

___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds

Re: [Reproducible-builds] [GSoC 2016] : Application review

2016-03-19 Thread Jérémy Bobbio
Satyam Zode:
> I have written project proposal (application) for the same. I kindly
> request you to review the application [1]. I need your valuable
> feedback and suggestions. I will consider it to improve an
> application.
>  [1]: https://wiki.debian.org/SummerOfCode2016/StudentApplications/SatyamZode

Thanks for your application. I much appreciated that it's done before
the deadline. I also like you being clear of your other commitments.

I think it would have been fine application for last summer, but we've
made significant progress on several fronts since, therefore I'm not
convinced that there's much work left in the tasks your propose. I think
it would be better to aim with more precise tasks, e.g. toolchain
software you'll be improving to solve classes of issues, or at leas an
outline the features you intend to add to strip-nondeterminism or
diffoscope.

(If you feel you're part of the reproducible builds team and disagree
with my comments, please say so!)

If you look at packages identified as leaving timestamps in gzip
headers, you'll see that most of them already have patches, and the ones
who don't are affected with other issues:
https://tests.reproducible-builds.org/issues/unstable/timestamps_in_gzip_headers_issue.html
These other issues probably deter maintainers' motivation to fix the
problems with gzip timestamps.

Almost all packages with varying mtimes in data.tar or control.tar have
patches or have been fixed through toolchain improvements:
https://tests.reproducible-builds.org/issues/unstable/varying_mtimes_in_data_tar_gz_or_control_tar_gz_issue.html

It feels quite suboptimal to highlight user and groups in tarballs as
separate issues as I think all are affected by other tarball related
issues. They should be fixed at the same time:
https://tests.reproducible-builds.org/issues/unstable/users_and_groups_in_tarball_issue.html

Regarding timestamps due to C pre-processor macros, Dhole is waiting
for GCC patch window to open again—which will be in April, IIRC.
So unless you intend to work on adding support for SOURCE_DATE_EPOCH in
clang, I'm not sure there's much work left on this issue. I believe that
fixing the 400+ packages individually should not be undertaken if
we can avoid it.
https://gcc.gnu.org/ml/gcc-patches/2015-12/msg01402.html
https://wiki.debian.org/ReproducibleBuilds/TimestampsFromCPPMacros

Emmanuel Bourg has been working and fixing almost all Java-related
issues in the course of the past year. I expect he'll probably work on
this fixing locale related javadoc issue in a near future. I guess you
could coordinate with him to write the necessary patches, though.
https://tests.reproducible-builds.org/issues/unstable/locale_in_documentation_generated_by_javadoc_issue.html

These quick evaluations leave me the feeling that your proposed schedule
is currently not adequate with actual needs of the reproducible builds
effort.

This probably means that progress can be made on making more visible
areas that actually require work…

-- 
Lunar.''`. 
lu...@debian.org: :Ⓐ  :  # apt-get install anarchism
`. `'` 
  `-   


signature.asc
Description: Digital signature
___
Reproducible-builds mailing list
Reproducible-builds@lists.alioth.debian.org
http://lists.alioth.debian.org/cgi-bin/mailman/listinfo/reproducible-builds