Re: Granary & SocketHub

2016-10-20 Thread Benjamin Young
Then it sounds like we're decidedly on the same page...and now to the work! ;)


Also (for later): http://www.hydra-cg.com/


It's more or less what you're after, I think: descriptions of endpoints and 
HTTP-based actions to be taken on them. There are certainly others like this 
out there also, but most are more limited and more focused on documenting or 
describing one's own API vs. doing that same thing for any API anywhere + 
Linked Data. ^_^


Thanks, Steve.


Excited!

Benjamin


From: sblackmon 
Sent: Thursday, October 20, 2016 3:06:21 PM
To: dev@streams.incubator.apache.org; Benjamin Young
Cc: Matt Franklin
Subject: Re: Granary & SocketHub



On October 20, 2016 at 1:33:33 PM, Benjamin Young 
(byo...@bigbluehat.com<mailto:byo...@bigbluehat.com>) wrote:

Great points, Steven.


What's always attracted me to Apache Streams is it's descriptiveness (via JSON 
Schemas documents) vs. prescriptive-ness. Granary's approach is (currently? ;) 
) more prescriptive:

https://github.com/snarfed/granary/blob/master/granary/twitter.py

vs.

https://github.com/apache/streams/tree/STREAMS-26/streams-contrib/streams-provider-twitter

...which is mostly (though not all) a collection of .json and .conf files with 
a handful of .java files needed (afaict) for last-mile integration with one's 
tool.


The future I dream about is one where I can pick my tool for my idiosyncratic 
language, operating system, license reasons, but they'll all work off shared, 
descriptive "knowledge" documents.


I think we (the active streams developers) agree that data descriptor and 
translation rules should ideally be
a) human-readable
b) machine-readable
c) programming language-agnostic
d) internationalized
e) compliant with web standards, or community standards where suitable web 
standards don’t exist

With regard to operating system flexibility, java byte-code and docker 
containers are as universal as any other run-time platform, with the possible 
exception of javascript.

Apache License 2.0 is about as permissive as they come, so hopefully no 
concerns there?

Otherwise, we're all pulling separately, and end up with snowflake systems to 
process snowflake APIs. However, I also know it's unlikely everyone will come 
"under one roof" to work on things. My hope, though, is that the output of this 
group (and Granary and Sockethub and...) will be re-usable by as wide an 
audience as possible--hence the value of description over prescription (at 
least in my book ;) ).


A major driver behind using json schemas and hocon snippets so widely 
throughout the project, and the reason we push all of them onto the website 
with each snapshot and each release, is to facilitate re-use and re-mixing of 
those artifacts in other projects using URIs.

Ideally it should be possible for some other project to build a system with 
black-box behavior identical to Apache Streams in another language with much 
less code by piggy-backing on these public resources.  This could even extend 
to the mechanics of collection, if we specified how to submit and parse HTTP 
requests with text resources rather than using java sdks.  We’ve got a long way 
to go to reach this objective, but philosophically I’m all for maximizing the 
re-usability of this codebase and supporting complementary efforts.

Granted, if I'm barking up the wrong tree (again), I'm happy to wander off...


Please don’t!  Stay and help!

Is anything in the above sane? ;)


I think so!  P.S. I’m hopeful that bringing this code-base into compliance with 
AS 2.0 and JSON-LD will open up the world of RDF tools and specs and allow more 
of what’s currently expressible only with source code to be expressed with W3C 
standards compliant resource files published on the web going forward.

Cheers!

Benjamin

--

http://bigbluehat.com/

http://linkedin.com/in/benjaminyoung


From: sblackmon 
Sent: Thursday, October 20, 2016 1:26:38 PM
To: dev@streams.incubator.apache.org
Cc: Matt Franklin; Benjamin Young
Subject: Re: Granary & SocketHub

On October 18, 2016 at 6:09:49 PM, Matt Franklin 
(m.ben.frank...@gmail.com<mailto:m.ben.frank...@gmail.com>) wrote:
On Tue, Oct 18, 2016 at 10:39 AM Benjamin Young 
wrote:

> (resending from the correct account…likely the other got spammed…)
>
> Granary is a project with similar ideas and intents as Apache Streams
> (which also needs AS2 support ;) ):
> https://github.com/snarfed/granary
>

Ryan from Granary is on the list I think.  Hey Ryan!  Cool stuff, too bad it’s 
python :)

> In fact Apache Streams gets a mention in their “Related Work” section:
> https://github.com/snarfed/granary#related-work
>
> Also mentioned in the Granary related work section is SocketHub:
> https://github.com/sockethub/sockethub
>

Cool stuff, too bad it’s LGPL :)

> It’s aims a

Re: Granary & SocketHub

2016-10-20 Thread sblackmon

On October 20, 2016 at 1:33:33 PM, Benjamin Young (byo...@bigbluehat.com) wrote:

Great points, Steven.



What's always attracted me to Apache Streams is it's descriptiveness (via JSON 
Schemas documents) vs. prescriptive-ness. Granary's approach is (currently? ;) 
) more prescriptive:

https://github.com/snarfed/granary/blob/master/granary/twitter.py

vs.

https://github.com/apache/streams/tree/STREAMS-26/streams-contrib/streams-provider-twitter

...which is mostly (though not all) a collection of .json and .conf files with 
a handful of .java files needed (afaict) for last-mile integration with one's 
tool.



The future I dream about is one where I can pick my tool for my idiosyncratic 
language, operating system, license reasons, but they'll all work off shared, 
descriptive "knowledge" documents.



I think we (the active streams developers) agree that data descriptor and 
translation rules should ideally be
a) human-readable
b) machine-readable
c) programming language-agnostic
d) internationalized
e) compliant with web standards, or community standards where suitable web 
standards don’t exist

With regard to operating system flexibility, java byte-code and docker 
containers are as universal as any other run-time platform, with the possible 
exception of javascript.

Apache License 2.0 is about as permissive as they come, so hopefully no 
concerns there?
Otherwise, we're all pulling separately, and end up with snowflake systems to 
process snowflake APIs. However, I also know it's unlikely everyone will come 
"under one roof" to work on things. My hope, though, is that the output of this 
group (and Granary and Sockethub and...) will be re-usable by as wide an 
audience as possible--hence the value of description over prescription (at 
least in my book ;) ).



A major driver behind using json schemas and hocon snippets so widely 
throughout the project, and the reason we push all of them onto the website 
with each snapshot and each release, is to facilitate re-use and re-mixing of 
those artifacts in other projects using URIs.  

Ideally it should be possible for some other project to build a system with 
black-box behavior identical to Apache Streams in another language with much 
less code by piggy-backing on these public resources.  This could even extend 
to the mechanics of collection, if we specified how to submit and parse HTTP 
requests with text resources rather than using java sdks.  We’ve got a long way 
to go to reach this objective, but philosophically I’m all for maximizing the 
re-usability of this codebase and supporting complementary efforts.
Granted, if I'm barking up the wrong tree (again), I'm happy to wander off...



Please don’t!  Stay and help!
Is anything in the above sane? ;)



I think so!  P.S. I’m hopeful that bringing this code-base into compliance with 
AS 2.0 and JSON-LD will open up the world of RDF tools and specs and allow more 
of what’s currently expressible only with source code to be expressed with W3C 
standards compliant resource files published on the web going forward.
Cheers!

Benjamin

--

http://bigbluehat.com/

http://linkedin.com/in/benjaminyoung

From: sblackmon 
Sent: Thursday, October 20, 2016 1:26:38 PM
To: dev@streams.incubator.apache.org
Cc: Matt Franklin; Benjamin Young
Subject: Re: Granary & SocketHub
 
On October 18, 2016 at 6:09:49 PM, Matt Franklin (m.ben.frank...@gmail.com) 
wrote:
On Tue, Oct 18, 2016 at 10:39 AM Benjamin Young  
wrote: 

> (resending from the correct account…likely the other got spammed…) 
> 
> Granary is a project with similar ideas and intents as Apache Streams 
> (which also needs AS2 support ;) ): 
> https://github.com/snarfed/granary 
> 
Ryan from Granary is on the list I think.  Hey Ryan!  Cool stuff, too bad it’s 
python :)


> In fact Apache Streams gets a mention in their “Related Work” section: 
> https://github.com/snarfed/granary#related-work 
> 
> Also mentioned in the Granary related work section is SocketHub: 
> https://github.com/sockethub/sockethub 
> 
Cool stuff, too bad it’s LGPL :)


> It’s aims are similar, but it’s reaching way beyond Web-based social APIs 
> and “back” to including things like IRC, Email, etc. 


Non-SNS data sources are important for sure. I’ve posted some work on my 
personal github using the streams framework to parse MBOX files - 
https://github.com/steveblackmon/streams-apache - and to collect quantified 
self data - https://github.com/steveblackmon/humanapi-streams
IRC is interesting as well.


> What’s significant about both these projects (and others they link to) are
> the stories they’re telling developers—which we can crib from as we think
> about the Streams “pitch.” They also have relatively minimal setup
> docs—which Streams is also heading toward (go Steve!).
>

Agreed this is key


The existence of other open-source projects with similar 

Re: Granary & SocketHub

2016-10-20 Thread Benjamin Young
Great points, Steven.


What's always attracted me to Apache Streams is it's descriptiveness (via JSON 
Schemas documents) vs. prescriptive-ness. Granary's approach is (currently? ;) 
) more prescriptive:

https://github.com/snarfed/granary/blob/master/granary/twitter.py

vs.

https://github.com/apache/streams/tree/STREAMS-26/streams-contrib/streams-provider-twitter

...which is mostly (though not all) a collection of .json and .conf files with 
a handful of .java files needed (afaict) for last-mile integration with one's 
tool.


The future I dream about is one where I can pick my tool for my idiosyncratic 
language, operating system, license reasons, but they'll all work off shared, 
descriptive "knowledge" documents.


Otherwise, we're all pulling separately, and end up with snowflake systems to 
process snowflake APIs. However, I also know it's unlikely everyone will come 
"under one roof" to work on things. My hope, though, is that the output of this 
group (and Granary and Sockethub and...) will be re-usable by as wide an 
audience as possible--hence the value of description over prescription (at 
least in my book ;) ).


Granted, if I'm barking up the wrong tree (again), I'm happy to wander off...


Is anything in the above sane? ;)


Cheers!

Benjamin

--

http://bigbluehat.com/

http://linkedin.com/in/benjaminyoung


From: sblackmon 
Sent: Thursday, October 20, 2016 1:26:38 PM
To: dev@streams.incubator.apache.org
Cc: Matt Franklin; Benjamin Young
Subject: Re: Granary & SocketHub

On October 18, 2016 at 6:09:49 PM, Matt Franklin 
(m.ben.frank...@gmail.com<mailto:m.ben.frank...@gmail.com>) wrote:
On Tue, Oct 18, 2016 at 10:39 AM Benjamin Young 
wrote:

> (resending from the correct account...likely the other got spammed...)
>
> Granary is a project with similar ideas and intents as Apache Streams
> (which also needs AS2 support ;) ):
> https://github.com/snarfed/granary
>

Ryan from Granary is on the list I think.  Hey Ryan!  Cool stuff, too bad it's 
python :)

> In fact Apache Streams gets a mention in their "Related Work" section:
> https://github.com/snarfed/granary#related-work
>
> Also mentioned in the Granary related work section is SocketHub:
> https://github.com/sockethub/sockethub
>

Cool stuff, too bad it's LGPL :)

> It's aims are similar, but it's reaching way beyond Web-based social APIs
> and "back" to including things like IRC, Email, etc.


Non-SNS data sources are important for sure. I've posted some work on my 
personal github using the streams framework to parse MBOX files - 
https://github.com/steveblackmon/streams-apache - and to collect quantified 
self data - https://github.com/steveblackmon/humanapi-streams
IRC is interesting as well.


> What's significant about both these projects (and others they link to) are
> the stories they're telling developers-which we can crib from as we think
> about the Streams "pitch." They also have relatively minimal setup
> docs-which Streams is also heading toward (go Steve!).
>

Agreed this is key


The existence of other open-source projects with similar themes suggests we're 
onto an important problem.  We should pay attention to these projects and what 
is working for them WRT user growth, community growth, tech media coverage, 
etc...


>
> Again, my key objective is to understand the Apache Streams vision along
> side projects like these and within the wider space of consolidating social
> data. What market does it serve? Is it "personal" (as these projects seem
> to be)? Or commercial? Or developer-only (library/framework for wiring up
> your own idiosyncratic stuff)?
>

I think the overall objective of streams remains very similar to what it
started as: A way to easily and flexibly ingest multiple different sources
of 'activity' data in a normalized ActivityStreams format. For me
personally, my interest is in ingesting this data at scale and with as
little internally-maintained code as possible.

While most of the development so far has been geared toward enabling back-end / 
commercial-scale data collection and management, I think the future should be 
more about enabling individuals and businesses to transcend data silos using 
computing resources and code entirely under their own control. This might mean 
supporting regular users with a full-featured SaaS application in addition to 
continued work on data interoperability.


>
> Thanks for reading, pondering, and helping me help. :)
>
> Cheers!
> Benjamin
>
> --
> http://bigbluehat.com/
> http://linkedin.com/in/benjaminyoung
>
>


Re: Granary & SocketHub

2016-10-20 Thread sblackmon
On October 18, 2016 at 6:09:49 PM, Matt Franklin (m.ben.frank...@gmail.com) 
wrote:
On Tue, Oct 18, 2016 at 10:39 AM Benjamin Young  
wrote: 

> (resending from the correct account…likely the other got spammed…) 
> 
> Granary is a project with similar ideas and intents as Apache Streams 
> (which also needs AS2 support ;) ): 
> https://github.com/snarfed/granary 
> 
Ryan from Granary is on the list I think.  Hey Ryan!  Cool stuff, too bad it’s 
python :)


> In fact Apache Streams gets a mention in their “Related Work” section: 
> https://github.com/snarfed/granary#related-work 
> 
> Also mentioned in the Granary related work section is SocketHub: 
> https://github.com/sockethub/sockethub 
> 
Cool stuff, too bad it’s LGPL :)


> It’s aims are similar, but it’s reaching way beyond Web-based social APIs 
> and “back” to including things like IRC, Email, etc. 


Non-SNS data sources are important for sure. I’ve posted some work on my 
personal github using the streams framework to parse MBOX files - 
https://github.com/steveblackmon/streams-apache - and to collect quantified 
self data - https://github.com/steveblackmon/humanapi-streams
IRC is interesting as well.


> What’s significant about both these projects (and others they link to) are  
> the stories they’re telling developers—which we can crib from as we think  
> about the Streams “pitch.” They also have relatively minimal setup  
> docs—which Streams is also heading toward (go Steve!).  
>  

Agreed this is key  


The existence of other open-source projects with similar themes suggests we’re 
onto an important problem.  We should pay attention to these projects and what 
is working for them WRT user growth, community growth, tech media coverage, 
etc...


>  
> Again, my key objective is to understand the Apache Streams vision along  
> side projects like these and within the wider space of consolidating social  
> data. What market does it serve? Is it “personal” (as these projects seem  
> to be)? Or commercial? Or developer-only (library/framework for wiring up  
> your own idiosyncratic stuff)?  
>  

I think the overall objective of streams remains very similar to what it  
started as: A way to easily and flexibly ingest multiple different sources  
of 'activity' data in a normalized ActivityStreams format. For me  
personally, my interest is in ingesting this data at scale and with as  
little internally-maintained code as possible.  

While most of the development so far has been geared toward enabling back-end / 
commercial-scale data collection and management, I think the future should be 
more about enabling individuals and businesses to transcend data silos using 
computing resources and code entirely under their own control. This might mean 
supporting regular users with a full-featured SaaS application in addition to 
continued work on data interoperability.


>  
> Thanks for reading, pondering, and helping me help. :)  
>  
> Cheers!  
> Benjamin  
>  
> --  
> http://bigbluehat.com/  
> http://linkedin.com/in/benjaminyoung  
>  
>  


Re: Granary & SocketHub

2016-10-18 Thread Matt Franklin
On Tue, Oct 18, 2016 at 10:39 AM Benjamin Young 
wrote:

> (resending from the correct account…likely the other got spammed…)
>
> Granary is a project with similar ideas and intents as Apache Streams
> (which also needs AS2 support ;) ):
> https://github.com/snarfed/granary
>
> In fact Apache Streams gets a mention in their “Related Work” section:
> https://github.com/snarfed/granary#related-work
>
> Also mentioned in the Granary related work section is SocketHub:
> https://github.com/sockethub/sockethub
>
> It’s aims are similar, but it’s reaching way beyond Web-based social APIs
> and “back” to including things like IRC, Email, etc.


> What’s significant about both these projects (and others they link to) are
> the stories they’re telling developers—which we can crib from as we think
> about the Streams “pitch.” They also have relatively minimal setup
> docs—which Streams is also heading toward (go Steve!).
>

Agreed this is key


>
> Again, my key objective is to understand the Apache Streams vision along
> side projects like these and within the wider space of consolidating social
> data. What market does it serve? Is it “personal” (as these projects seem
> to be)? Or commercial? Or developer-only (library/framework for wiring up
> your own idiosyncratic stuff)?
>

I think the overall objective of streams remains very similar to what it
started as:  A way to easily and flexibly ingest multiple different sources
of 'activity' data in a normalized ActivityStreams format.  For me
personally, my interest is in ingesting this data at scale and with as
little internally-maintained code as possible.


>
> Thanks for reading, pondering, and helping me help. :)
>
> Cheers!
> Benjamin
>
> --
> http://bigbluehat.com/
> http://linkedin.com/in/benjaminyoung
>
>