Re: Adding asfext:registered to projects.a.o?
How about something very modern - moving to JSON-LD schema.org annotations in the root index of the project homepage and just fetching all of those..? Seriously; keeping them under a single comdev control sounds most sensible as I doubt the distributed DOAP files are well maintained. Projects can raise pull requests to update and then see their changes live on the new projects.apache.org pages On 11 Feb 2016 17:35, "sebb"wrote: > On 11 February 2016 at 12:03, Shane Curcuru wrote: > > I need to annotate our structured data set of Apache projects to track > > which project names are registered trademarks. This is needed to be > > able to properly generate a.o/foundation/marks/list (which is currently > > sadly outdated since it's manually built now). This is a serious need > > for Brand Management, since we regularly have third parties say "but you > > didn't SAY it was your trademark, so I can do it anyway..." > > > > My thought is to annotate the PMC DOAP files with a registered marker, > > then use the existing projects.a.o building of the organized data. Then > > use either JS or some cron static generation to display the actual > > marks/list page. > > There are two kinds of RDF files: > - the PMC RDF files [1] which are mainly stored in the comdev area > [2], though they can also be stored elsewhere. > The locations of the files are held in committees.xml [3] > [These are not actually DOAP files, though the format looks similar.] > > - the project DOAP files which are stored by individual projects; they > are listed in projects.xml [4] > > A single PMC RDF file can be associated with multiple DOAP files, e.g. > Commons, Creadur, Tomcat all have multiple independent project > releases. > > > Is annotating the project data sources the best idea, or should I simply > > create a new stable URL data source that's just a list of registered > > names, and join the tables? > > I doubt if either of the above file types are suitable. > The location of the index XML files [3], [4] has already been changed > once (when projects-new was established). > > DOAP files are located all over the place and are often moved within > the SCM without updating the index file. > If they are located in the source tree there are often multiple copies > in different branches. > > PMC RDF files may not be updateable except by the project (if located > in their SCM), and again may move without warning if they are not in > [2]. > > It would potentially be possible to recover the PMC RDF files from > their external locations and insist that they only be stored in the > comdev area. > But a single PMC may have multiple marks. Potentially also a project > may move from a PMC to become its own PMC. > > Therefore I think a separate file is needed. > That would also allow write access to be limited if necessary. > > > The end result needs to be webcontent listing projects like: > > > > The ASF claims these trademarks > > ...list all active TLPs > > Apache {$projectname} > > {$if registered then "" else ""} > > > > > > {$shortdesc} > > ... > > The following projects are retired > > ...list all Attic projects > > > > The following projects are in incubation; all trademarks here may be > > property of respective owners > > ...list all Incubation projects > > > > Separately, we should list the name of each software *product* here, > > since if we offer something with a clear name as an independently > > downloadable software product, it can be our trademark. So I'd like to > > list "Apache Directory Studio", since that's a notable name and a major > > product. But I don't want to list "Apache Commons Foo Bar Baz and > > Kitchensink", since those are effectively just minor components that > > aren't really worth claiming. > > > > Comments/suggestions please? I'm including the Whimsical project since > > they are also major consumers of this data. > > > > - Shane > > [1] https://projects.apache.org/pmc_rdf.html > > [2] > https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/ > [3] > https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml > [4] > https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml >
Re: Adding asfext:registered to projects.a.o?
On Thu, Feb 11, 2016 at 2:38 PM, Stian Soiland-Reyeswrote: > How about something very modern - moving to JSON-LD schema.org annotations > in the root index of the project homepage and just fetching all of those..? > > Seriously; keeping them under a single comdev control sounds most sensible > as I doubt the distributed DOAP files are well maintained. Projects can > raise pull requests to update and then see their changes live on the new > projects.apache.org pages I agree with centralize first, and decentralize when the need shows itself. As for format: let prototype. Seriously. If Shane can provide some initial test data in any format (e.g. CSV) I can convert that to YAML and you can convert it to JSON-LD, and Shane can determine which would be easier for him to maintain. I'll also go the extra step and write a small script that converts it to JSON (note: POJO, not LD), and write an ugly page that fetches and displays that data. Others can do likewise. Shane should be able to use these programs as examples and extend them as he sees fit. - Sam Ruby > On 11 Feb 2016 17:35, "sebb" wrote: > >> On 11 February 2016 at 12:03, Shane Curcuru wrote: >> > I need to annotate our structured data set of Apache projects to track >> > which project names are registered trademarks. This is needed to be >> > able to properly generate a.o/foundation/marks/list (which is currently >> > sadly outdated since it's manually built now). This is a serious need >> > for Brand Management, since we regularly have third parties say "but you >> > didn't SAY it was your trademark, so I can do it anyway..." >> > >> > My thought is to annotate the PMC DOAP files with a registered marker, >> > then use the existing projects.a.o building of the organized data. Then >> > use either JS or some cron static generation to display the actual >> > marks/list page. >> >> There are two kinds of RDF files: >> - the PMC RDF files [1] which are mainly stored in the comdev area >> [2], though they can also be stored elsewhere. >> The locations of the files are held in committees.xml [3] >> [These are not actually DOAP files, though the format looks similar.] >> >> - the project DOAP files which are stored by individual projects; they >> are listed in projects.xml [4] >> >> A single PMC RDF file can be associated with multiple DOAP files, e.g. >> Commons, Creadur, Tomcat all have multiple independent project >> releases. >> >> > Is annotating the project data sources the best idea, or should I simply >> > create a new stable URL data source that's just a list of registered >> > names, and join the tables? >> >> I doubt if either of the above file types are suitable. >> The location of the index XML files [3], [4] has already been changed >> once (when projects-new was established). >> >> DOAP files are located all over the place and are often moved within >> the SCM without updating the index file. >> If they are located in the source tree there are often multiple copies >> in different branches. >> >> PMC RDF files may not be updateable except by the project (if located >> in their SCM), and again may move without warning if they are not in >> [2]. >> >> It would potentially be possible to recover the PMC RDF files from >> their external locations and insist that they only be stored in the >> comdev area. >> But a single PMC may have multiple marks. Potentially also a project >> may move from a PMC to become its own PMC. >> >> Therefore I think a separate file is needed. >> That would also allow write access to be limited if necessary. >> >> > The end result needs to be webcontent listing projects like: >> > >> > The ASF claims these trademarks >> > ...list all active TLPs >> > Apache {$projectname} >> > {$if registered then "" else ""} >> > >> > >> > {$shortdesc} >> > ... >> > The following projects are retired >> > ...list all Attic projects >> > >> > The following projects are in incubation; all trademarks here may be >> > property of respective owners >> > ...list all Incubation projects >> > >> > Separately, we should list the name of each software *product* here, >> > since if we offer something with a clear name as an independently >> > downloadable software product, it can be our trademark. So I'd like to >> > list "Apache Directory Studio", since that's a notable name and a major >> > product. But I don't want to list "Apache Commons Foo Bar Baz and >> > Kitchensink", since those are effectively just minor components that >> > aren't really worth claiming. >> > >> > Comments/suggestions please? I'm including the Whimsical project since >> > they are also major consumers of this data. >> > >> > - Shane >> >> [1] https://projects.apache.org/pmc_rdf.html >> >> [2] >> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/ >> [3] >> https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml >> [4] >>
Re: Adding asfext:registered to projects.a.o?
On 11 February 2016 at 12:03, Shane Curcuruwrote: > I need to annotate our structured data set of Apache projects to track > which project names are registered trademarks. This is needed to be > able to properly generate a.o/foundation/marks/list (which is currently > sadly outdated since it's manually built now). This is a serious need > for Brand Management, since we regularly have third parties say "but you > didn't SAY it was your trademark, so I can do it anyway..." > > My thought is to annotate the PMC DOAP files with a registered marker, > then use the existing projects.a.o building of the organized data. Then > use either JS or some cron static generation to display the actual > marks/list page. There are two kinds of RDF files: - the PMC RDF files [1] which are mainly stored in the comdev area [2], though they can also be stored elsewhere. The locations of the files are held in committees.xml [3] [These are not actually DOAP files, though the format looks similar.] - the project DOAP files which are stored by individual projects; they are listed in projects.xml [4] A single PMC RDF file can be associated with multiple DOAP files, e.g. Commons, Creadur, Tomcat all have multiple independent project releases. > Is annotating the project data sources the best idea, or should I simply > create a new stable URL data source that's just a list of registered > names, and join the tables? I doubt if either of the above file types are suitable. The location of the index XML files [3], [4] has already been changed once (when projects-new was established). DOAP files are located all over the place and are often moved within the SCM without updating the index file. If they are located in the source tree there are often multiple copies in different branches. PMC RDF files may not be updateable except by the project (if located in their SCM), and again may move without warning if they are not in [2]. It would potentially be possible to recover the PMC RDF files from their external locations and insist that they only be stored in the comdev area. But a single PMC may have multiple marks. Potentially also a project may move from a PMC to become its own PMC. Therefore I think a separate file is needed. That would also allow write access to be limited if necessary. > The end result needs to be webcontent listing projects like: > > The ASF claims these trademarks > ...list all active TLPs > Apache {$projectname} > {$if registered then "" else ""} > > > {$shortdesc} > ... > The following projects are retired > ...list all Attic projects > > The following projects are in incubation; all trademarks here may be > property of respective owners > ...list all Incubation projects > > Separately, we should list the name of each software *product* here, > since if we offer something with a clear name as an independently > downloadable software product, it can be our trademark. So I'd like to > list "Apache Directory Studio", since that's a notable name and a major > product. But I don't want to list "Apache Commons Foo Bar Baz and > Kitchensink", since those are effectively just minor components that > aren't really worth claiming. > > Comments/suggestions please? I'm including the Whimsical project since > they are also major consumers of this data. > > - Shane [1] https://projects.apache.org/pmc_rdf.html [2] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees/ [3] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/committees.xml [4] https://svn.apache.org/repos/asf/comdev/projects.apache.org/data/projects.xml
Re: Adding asfext:registered to projects.a.o?
On Thu, Feb 11, 2016 at 11:35 AM, sebbwrote: > On 11 February 2016 at 12:03, Shane Curcuru wrote: >> I need to annotate our structured data set of Apache projects to track >> which project names are registered trademarks. This is needed to be >> able to properly generate a.o/foundation/marks/list (which is currently >> sadly outdated since it's manually built now). This is a serious need >> for Brand Management, since we regularly have third parties say "but you >> didn't SAY it was your trademark, so I can do it anyway..." >> >> My thought is to annotate the PMC DOAP files with a registered marker, >> then use the existing projects.a.o building of the organized data. Then >> use either JS or some cron static generation to display the actual >> marks/list page. > > There are two kinds of RDF files: > - the PMC RDF files [1] which are mainly stored in the comdev area > [2], though they can also be stored elsewhere. > The locations of the files are held in committees.xml [3] > [These are not actually DOAP files, though the format looks similar.] > > - the project DOAP files which are stored by individual projects; they > are listed in projects.xml [4] > > A single PMC RDF file can be associated with multiple DOAP files, e.g. > Commons, Creadur, Tomcat all have multiple independent project > releases. > >> Is annotating the project data sources the best idea, or should I simply >> create a new stable URL data source that's just a list of registered >> names, and join the tables? > > I doubt if either of the above file types are suitable. > The location of the index XML files [3], [4] has already been changed > once (when projects-new was established). > > DOAP files are located all over the place and are often moved within > the SCM without updating the index file. > If they are located in the source tree there are often multiple copies > in different branches. > > PMC RDF files may not be updateable except by the project (if located > in their SCM), and again may move without warning if they are not in > [2]. > > It would potentially be possible to recover the PMC RDF files from > their external locations and insist that they only be stored in the > comdev area. > But a single PMC may have multiple marks. Potentially also a project > may move from a PMC to become its own PMC. > > Therefore I think a separate file is needed. > That would also allow write access to be limited if necessary. There are indeed multiple ways to solve this, and each way involves a tradeoff. I would suggest separating this question into three parts. - - - First, where is the ultimate source for the data. And the best way to address that question is to first decide who will be updating that data. Will it be each project, or those on the branding mailing list, or only VP brand? Knowing the answer to that question will make a big difference. My suggestion would be to start simple with a single file, in the same directory as committee-info.txt. I'd suggest YAML as a format as it is a good tradeoff between human edit-ability and programmatic parse-ability. - - - Next is access. What you need is something that takes the data from the private repository, sanitizes it, and publishes the result for public consumption. Whimsy has a bunch of cron jobs that places similar data here: https://whimsy.apache.org/public/. A script that parses a YAML file out of SVN, selects and filters out various parts, and publishes the results in JSON format is very doable. --- Finally, there is publishing. While that could be a cron job that produces static HTML, web browsers have the ability to consume JSON and format the results. That's probably the best solution to this. --- The Apache Phone book is an example of an application that uses the above design: https://home.apache.org/phonebook.html In fact, if the data is made available in this manner, the trademark information could be included directly in the results of the page it produces. That's one of the nice things about having a public JSON version of the data published - multiple tools can consume that data. - Sam Ruby >> The end result needs to be webcontent listing projects like: >> >> The ASF claims these trademarks >> ...list all active TLPs >> Apache {$projectname} >> {$if registered then "" else ""} >> >> >> {$shortdesc} >> ... >> The following projects are retired >> ...list all Attic projects >> >> The following projects are in incubation; all trademarks here may be >> property of respective owners >> ...list all Incubation projects >> >> Separately, we should list the name of each software *product* here, >> since if we offer something with a clear name as an independently >> downloadable software product, it can be our trademark. So I'd like to >> list "Apache Directory Studio", since that's a notable name and a major >> product. But I don't want to list "Apache Commons Foo Bar Baz and >> Kitchensink", since those are
Re: Adding asfext:registered to projects.a.o?
Sam Ruby wrote on 2/11/16 12:28 PM: > On Thu, Feb 11, 2016 at 11:35 AM, sebbwrote: >> On 11 February 2016 at 12:03, Shane Curcuru wrote: >>> I need to annotate our structured data set of Apache projects to track >>> which project names are registered trademarks. This is needed to be >>> able to properly generate a.o/foundation/marks/list ... ... >> Therefore I think a separate file is needed. >> That would also allow write access to be limited if necessary. > > There are indeed multiple ways to solve this, and each way involves a > tradeoff. > > I would suggest separating this question into three parts. > > - - - > > First, where is the ultimate source for the data. And the best way to > address that question is to first decide who will be updating that > data. Will it be each project, or those on the branding mailing list, > or only VP brand? Knowing the answer to that question will make a big > difference. > > My suggestion would be to start simple with a single file, in the same > directory as committee-info.txt. I'd suggest YAML as a format as it > is a good tradeoff between human edit-ability and programmatic > parse-ability. The raw data of which TLP names are registered can be public; it's already findable in various national registries. I may want to add an additional enum "application-submitted", but even that can be public. Theoretically just the brand committee should update the file, but in reality we can restrict to members; I don't think they'll mess anything up. The file won't change that often, but changes will be manual (i.e. when we hear from counsel about applications). > > - - - > > Next is access. What you need is something that takes the data from > the private repository, sanitizes it, and publishes the result for > public consumption. Whimsy has a bunch of cron jobs that places > similar data here: https://whimsy.apache.org/public/. A script that > parses a YAML file out of SVN, selects and filters out various parts, > and publishes the results in JSON format is very doable. It can go in a public repository if that makes it easier. Of course, this data isn't technically owned by any one project, so we need to find a home for it, unless I should just dump it in the a.o site. Is there any overall place for structured data about corporate operations currently? > > --- > > Finally, there is publishing. While that could be a cron job that > produces static HTML, web browsers have the ability to consume JSON > and format the results. That's probably the best solution to this. Thinking it through, we should fold this data into a number of places: - The marks/list page, which needs to be regenerated each month after the board meeting formally graduates or attics projects. It likely has low traffic to the page itself, but needs to be accurate, because lawyers are the kind of people who will read it. - projects.a.o, where it would be really nice to annotate project names with the appropriate and symbols. As this service becomes more popular, having clear trademark indicators for our projects will help ensure that third parties know (and can verify) that the ASF takes it's trademarks seriously. - www.a.o homepage, where whatever parts of the main site are generated in any fashion include appropriate and symbols I figure the first thing is to come up with schema and location of where to put the source YAML/JSON file, then engineer the display into marks/list or the main projects.a.o stuff. Then see where to go from there. > > --- > > The Apache Phone book is an example of an application that uses the > above design: > > https://home.apache.org/phonebook.html > > In fact, if the data is made available in this manner, the trademark > information could be included directly in the results of the page it > produces. That's one of the nice things about having a public JSON > version of the data published - multiple tools can consume that data. Yeah, the more of these useful sites we have, it would be nice to fold this in so it just gets automatically included. It's especially important for registered marks, because some countries require use of the (R). - Shane