Re: [datameet] How about One Mega Collection for all the Datasets we collect!

2020-03-03 Thread Saketha Ramanujam
@Anand you've mentioned about making the datasets discoverable using
schema.org markups, my doubt is that, with the UI offered by Datasette we
might not be able to add markups directly without tweaking with the
datasette templates itself.
Cataloging them to a github repo sounds good idea.

---
Saketh.

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAHzVmyuBQgb%2BxushopguATekxtnamCxnrO3Kw3vzx3gt11Fwig%40mail.gmail.com.


Re: [datameet] How about One Mega Collection for all the Datasets we collect!

2020-03-02 Thread Bodhisattwa Mandal
Hi,

Wikidata can be a place to insert the datasets through OpenRefine and other
tools. The data put there can be easily searchable through Wikidata query
service. The database is free and open to everyone and is one of the best
platform for linked data.

Datameet generated data are already being ingested into Wikidata for some
part of the country already.

Regards,
Bodhisattwa



On Mon, 2 Mar 2020, 19:52 Akshay S Dinesh,  wrote:

> When I first saw the headline I was imagining a mega dataset where all
> variables that are same across datasets are linked together somehow. (That
> would have been very relevant for our work at Metastring foundation).
> Anyhow.
>
> BTW, https://magda.io/ is a federated approach at dataset curation by the
> same people behind https://data.gov.au/
>
> Akshay
>
>
> On Mon, Mar 2, 2020 at 6:27 PM Anand Chitipothu 
> wrote:
>
>> On Mon, Mar 2, 2020 at 6:03 PM Saketha Ramanujam <
>> saketh.ramanuja...@gmail.com> wrote:
>>
>>> Hello all,
>>>
>>> This is Saketha Ramanujam an independent researcher from Visakhapatnam.
>>> I've been a passive member in the discussions that happen in this group.
>>> Great thanks to those who post immediate and useful responses to all the
>>> questions over various topics.
>>>
>>>
>>> We all have either individually or as teams have compiled, scraped and
>>> prepared a lot of datasets which might not be available straight away.
>>> If somebody else wants to access them or look whether if some data set
>>> has already been scraped/compiled by someone, it's kind of hard right now.
>>> They either send an email here or spend a lot of time searching for the
>>> same. My idea is that we have a collection of datasets displayed/hosted
>>> using tools like datasette .
>>>
>>> Please let me know if this is actually doable possible for us to do
>>> something as a community.
>>>
>>
>> The problem is not lack of centralization, but lack of discoverability.
>>
>> I think the first step is to publish the datasets that you already have.
>> Datasette is a good tool for that.
>> The second step is to make it discoverable. Google has a dataset search
>> service[1] and that is based on schema.org markups[2], which are pretty
>> easy to include on your web page.
>>
>> In addition to that it would probably be a good idea to send a mail this
>> list when you publish a dataset and it may also be a good idea to keep a
>> catalog of all the datasets referenced in this mailing list in a github
>> repo.
>>
>> Any thoughts?
>>
>> [1]: https://datasetsearch.research.google.com/
>> [2]: https://support.google.com/webmasters/thread/1960710
>>
>> Anand
>>
>> --
>> Datameet is a community of Data Science enthusiasts in India. Know more
>> about us by visiting http://datameet.org
>> ---
>> You received this message because you are subscribed to the Google Groups
>> "datameet" group.
>> To unsubscribe from this group and stop receiving emails from it, send an
>> email to datameet+unsubscr...@googlegroups.com.
>> To view this discussion on the web visit
>> https://groups.google.com/d/msgid/datameet/CAC7wXFwrPBMzx-A9Un3_9aSr4uCKZ068QepuWiKFoTyqLZJo4g%40mail.gmail.com
>> 
>> .
>>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/CAL%3D48C5tMrMuifjhHsRYBXOiAVdvZXTwHvxLmUDp77PzuHL1OQ%40mail.gmail.com
> 
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAHyrfgYu3Xi7-392kE7ibJG6eKzp2T1j%3DPT3Ec9%3DFvtO59FyHg%40mail.gmail.com.


Re: [datameet] How about One Mega Collection for all the Datasets we collect!

2020-03-02 Thread Akshay S Dinesh
When I first saw the headline I was imagining a mega dataset where all
variables that are same across datasets are linked together somehow. (That
would have been very relevant for our work at Metastring foundation).
Anyhow.

BTW, https://magda.io/ is a federated approach at dataset curation by the
same people behind https://data.gov.au/

Akshay


On Mon, Mar 2, 2020 at 6:27 PM Anand Chitipothu 
wrote:

> On Mon, Mar 2, 2020 at 6:03 PM Saketha Ramanujam <
> saketh.ramanuja...@gmail.com> wrote:
>
>> Hello all,
>>
>> This is Saketha Ramanujam an independent researcher from Visakhapatnam.
>> I've been a passive member in the discussions that happen in this group.
>> Great thanks to those who post immediate and useful responses to all the
>> questions over various topics.
>>
>>
>> We all have either individually or as teams have compiled, scraped and
>> prepared a lot of datasets which might not be available straight away.
>> If somebody else wants to access them or look whether if some data set
>> has already been scraped/compiled by someone, it's kind of hard right now.
>> They either send an email here or spend a lot of time searching for the
>> same. My idea is that we have a collection of datasets displayed/hosted
>> using tools like datasette .
>>
>> Please let me know if this is actually doable possible for us to do
>> something as a community.
>>
>
> The problem is not lack of centralization, but lack of discoverability.
>
> I think the first step is to publish the datasets that you already have.
> Datasette is a good tool for that.
> The second step is to make it discoverable. Google has a dataset search
> service[1] and that is based on schema.org markups[2], which are pretty
> easy to include on your web page.
>
> In addition to that it would probably be a good idea to send a mail this
> list when you publish a dataset and it may also be a good idea to keep a
> catalog of all the datasets referenced in this mailing list in a github
> repo.
>
> Any thoughts?
>
> [1]: https://datasetsearch.research.google.com/
> [2]: https://support.google.com/webmasters/thread/1960710
>
> Anand
>
> --
> Datameet is a community of Data Science enthusiasts in India. Know more
> about us by visiting http://datameet.org
> ---
> You received this message because you are subscribed to the Google Groups
> "datameet" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to datameet+unsubscr...@googlegroups.com.
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/datameet/CAC7wXFwrPBMzx-A9Un3_9aSr4uCKZ068QepuWiKFoTyqLZJo4g%40mail.gmail.com
> 
> .
>

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAL%3D48C5tMrMuifjhHsRYBXOiAVdvZXTwHvxLmUDp77PzuHL1OQ%40mail.gmail.com.


Re: [datameet] How about One Mega Collection for all the Datasets we collect!

2020-03-02 Thread Anand Chitipothu
On Mon, Mar 2, 2020 at 6:03 PM Saketha Ramanujam <
saketh.ramanuja...@gmail.com> wrote:

> Hello all,
>
> This is Saketha Ramanujam an independent researcher from Visakhapatnam.
> I've been a passive member in the discussions that happen in this group.
> Great thanks to those who post immediate and useful responses to all the
> questions over various topics.
>
>
> We all have either individually or as teams have compiled, scraped and
> prepared a lot of datasets which might not be available straight away.
> If somebody else wants to access them or look whether if some data set has
> already been scraped/compiled by someone, it's kind of hard right now.
> They either send an email here or spend a lot of time searching for the
> same. My idea is that we have a collection of datasets displayed/hosted
> using tools like datasette .
>
> Please let me know if this is actually doable possible for us to do
> something as a community.
>

The problem is not lack of centralization, but lack of discoverability.

I think the first step is to publish the datasets that you already have.
Datasette is a good tool for that.
The second step is to make it discoverable. Google has a dataset search
service[1] and that is based on schema.org markups[2], which are pretty
easy to include on your web page.

In addition to that it would probably be a good idea to send a mail this
list when you publish a dataset and it may also be a good idea to keep a
catalog of all the datasets referenced in this mailing list in a github
repo.

Any thoughts?

[1]: https://datasetsearch.research.google.com/
[2]: https://support.google.com/webmasters/thread/1960710

Anand

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/CAC7wXFwrPBMzx-A9Un3_9aSr4uCKZ068QepuWiKFoTyqLZJo4g%40mail.gmail.com.


[datameet] How about One Mega Collection for all the Datasets we collect!

2020-03-02 Thread Saketha Ramanujam
Hello all,

This is Saketha Ramanujam an independent researcher from Visakhapatnam. 
I've been a passive member in the discussions that happen in this group.
Great thanks to those who post immediate and useful responses to all the 
questions over various topics.


We all have either individually or as teams have compiled, scraped and 
prepared a lot of datasets which might not be available straight away. 
If somebody else wants to access them or look whether if some data set has 
already been scraped/compiled by someone, it's kind of hard right now.
They either send an email here or spend a lot of time searching for the 
same. My idea is that we have a collection of datasets displayed/hosted 
using tools like datasette .

Please let me know if this is actually doable possible for us to do 
something as a community.


Thanks,
Saketha Ramanujam

-- 
Datameet is a community of Data Science enthusiasts in India. Know more about 
us by visiting http://datameet.org
--- 
You received this message because you are subscribed to the Google Groups 
"datameet" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to datameet+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/datameet/85762b95-ab84-4578-9106-92bf8664668f%40googlegroups.com.