Re: [Dspace-tech] Problem with DSpace 1.5, 1.5.1 prevents indexing by search engines

2009-02-25 Thread Bram Luyten
great, thanks Rob,

I already tried with site:dspace.mit.edu/handle , basically the same as
inurl:handle, but show=full can make indeed the difference between
community//collection pages and item pages.

best regards,

Bram

@mire NV
Romeinse Straat 18
3001 Heverlee
Belgium
+32 2 888 29 56

http://www.atmire.com - Institutional Repository Solutions
http://www.togather.eu - Before getting together, get t...@ther


On Thu, Feb 19, 2009 at 6:04 PM, Robert Tansley roberttans...@google.comwrote:

 You won't get entirely accurate numbers but you can get ballpark figures
 with e.g.

 site:dspace.mit.edu inurl:handle inurl:show=full

 Basically this narrows things down to the full item record pages. Looks
 like there may be dups in there -- you could try some additional conditions.

 For the number of bitstreams:

 site:dspace.mit.edu inurl:bitstream

 Hope this helps

 Rob


 On Thu, Feb 19, 2009 at 05:47, Bram Luyten bluy...@gmail.com wrote:

 Hi Rob,

 I had a question somewhat related to robots.txt and they way how DSpace
 instances are being indexed by google.

 As a part of the Google analytics - DSpace comparison that I've been
 running, I would like to analyse which repositories are being indexed best
 by Google, and how that impacts their number of visits.

 As a first, very rough estimate, I searched for:

 site:repository url to get an indication of how many useful pages
 were indexed. It was interesting to see that these numbers did not really
 corellate with visits to this repository.
 I assumed that for many repositories, different browse pages were being
 indexed, and that these indexed pages were not very useful to generate
 visits // expose the content.

 In a second step, I tried to look for site:repository url -browse.
 The returned numbers were in some cases even less than half of the original
 number.
 But I realise this search is being too restrictive: because many pages
 include the word browse in their navigation bar, I'm probably excluding
 useful item pages etc in the search.

 So my question is the following:
 which search query could I use in Google, to get the number of useful
 indexed pages in Google (item pages, bitstreams, collection  community
 pages) ?

 Already an interesting finding from my research:
 the 15 repositories already included in the research, get 60% of their
 visits through search engines (average calculated on the visits in december
 2008). So even more reason to get exposure through search engines as
 optimized as possible.

 best regards,

 Bram

 @mire NV
 Romeinse Straat 18
 3001 Heverlee
 Belgium
 +32 2 888 29 56

 http://www.atmire.com - Institutional Repository Solutions
 http://www.togather.eu - Before getting together, get t...@ther


 On Thu, Feb 5, 2009 at 10:21 PM, Robert Tansley roberttans...@google.com
  wrote:

 To all users of DSpace 1.5 and DSpace 1.5.1:
 These versions of DSpace ship with a bad robots.txt file that prevents
 search engines such as Google Scholar or Yahoo from indexing any content on
 a DSpace site. To check if this applies to you:
 - Visit your site's robots.txt --
 http://your_dspace_hostname.edu/robots.txt
 - If you see the following line you have a bad robots.txt:

 Disallow: /browse

 It is important that you REMOVE this line from your robots.txt to ensure
 that your DSpace instance is correctly indexed by search engines. More info
 on ensuring your DSpace site is correctly indexed here:

 http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed

 Robert Tansley / Google


 --
 Create and Deploy Rich Internet Apps outside the browser with
 Adobe(R)AIR(TM)
 software. With Adobe AIR, Ajax developers can use existing skills and
 code to
 build responsive, highly engaging applications that combine the power of
 local
 resources and data with the reach of the web. Download the Adobe AIR SDK
 and
 Ajax docs to start building applications today-
 http://p.sf.net/sfu/adobe-com
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech




--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


Re: [Dspace-tech] Problem with DSpace 1.5, 1.5.1 prevents indexing by search engines

2009-02-19 Thread Robert Tansley
You won't get entirely accurate numbers but you can get ballpark figures
with e.g.

site:dspace.mit.edu inurl:handle inurl:show=full

Basically this narrows things down to the full item record pages. Looks
like there may be dups in there -- you could try some additional conditions.

For the number of bitstreams:

site:dspace.mit.edu inurl:bitstream

Hope this helps

Rob

On Thu, Feb 19, 2009 at 05:47, Bram Luyten bluy...@gmail.com wrote:

 Hi Rob,

 I had a question somewhat related to robots.txt and they way how DSpace
 instances are being indexed by google.

 As a part of the Google analytics - DSpace comparison that I've been
 running, I would like to analyse which repositories are being indexed best
 by Google, and how that impacts their number of visits.

 As a first, very rough estimate, I searched for:

 site:repository url to get an indication of how many useful pages
 were indexed. It was interesting to see that these numbers did not really
 corellate with visits to this repository.
 I assumed that for many repositories, different browse pages were being
 indexed, and that these indexed pages were not very useful to generate
 visits // expose the content.

 In a second step, I tried to look for site:repository url -browse.
 The returned numbers were in some cases even less than half of the original
 number.
 But I realise this search is being too restrictive: because many pages
 include the word browse in their navigation bar, I'm probably excluding
 useful item pages etc in the search.

 So my question is the following:
 which search query could I use in Google, to get the number of useful
 indexed pages in Google (item pages, bitstreams, collection  community
 pages) ?

 Already an interesting finding from my research:
 the 15 repositories already included in the research, get 60% of their
 visits through search engines (average calculated on the visits in december
 2008). So even more reason to get exposure through search engines as
 optimized as possible.

 best regards,

 Bram

 @mire NV
 Romeinse Straat 18
 3001 Heverlee
 Belgium
 +32 2 888 29 56

 http://www.atmire.com - Institutional Repository Solutions
 http://www.togather.eu - Before getting together, get t...@ther


 On Thu, Feb 5, 2009 at 10:21 PM, Robert Tansley 
 roberttans...@google.comwrote:

 To all users of DSpace 1.5 and DSpace 1.5.1:
 These versions of DSpace ship with a bad robots.txt file that prevents
 search engines such as Google Scholar or Yahoo from indexing any content on
 a DSpace site. To check if this applies to you:
 - Visit your site's robots.txt --
 http://your_dspace_hostname.edu/robots.txt
 - If you see the following line you have a bad robots.txt:

 Disallow: /browse

 It is important that you REMOVE this line from your robots.txt to ensure
 that your DSpace instance is correctly indexed by search engines. More info
 on ensuring your DSpace site is correctly indexed here:

 http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed

 Robert Tansley / Google


 --
 Create and Deploy Rich Internet Apps outside the browser with
 Adobe(R)AIR(TM)
 software. With Adobe AIR, Ajax developers can use existing skills and code
 to
 build responsive, highly engaging applications that combine the power of
 local
 resources and data with the reach of the web. Download the Adobe AIR SDK
 and
 Ajax docs to start building applications today-
 http://p.sf.net/sfu/adobe-com
 ___
 DSpace-tech mailing list
 DSpace-tech@lists.sourceforge.net
 https://lists.sourceforge.net/lists/listinfo/dspace-tech



--
Open Source Business Conference (OSBC), March 24-25, 2009, San Francisco, CA
-OSBC tackles the biggest issue in open source: Open Sourcing the Enterprise
-Strategies to boost innovation and cut costs with open source participation
-Receive a $600 discount off the registration fee with the source code: SFAD
http://p.sf.net/sfu/XcvMzF8H___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech


[Dspace-tech] Problem with DSpace 1.5, 1.5.1 prevents indexing by search engines

2009-02-05 Thread Robert Tansley
To all users of DSpace 1.5 and DSpace 1.5.1:
These versions of DSpace ship with a bad robots.txt file that prevents
search engines such as Google Scholar or Yahoo from indexing any content on
a DSpace site. To check if this applies to you:
- Visit your site's robots.txt -- http://your_dspace_hostname.edu/robots.txt
- If you see the following line you have a bad robots.txt:

Disallow: /browse

It is important that you REMOVE this line from your robots.txt to ensure
that your DSpace instance is correctly indexed by search engines. More info
on ensuring your DSpace site is correctly indexed here:

http://wiki.dspace.org/index.php?title=Ensuring_your_instance_is_indexed

Robert Tansley / Google
--
Create and Deploy Rich Internet Apps outside the browser with Adobe(R)AIR(TM)
software. With Adobe AIR, Ajax developers can use existing skills and code to
build responsive, highly engaging applications that combine the power of local
resources and data with the reach of the web. Download the Adobe AIR SDK and
Ajax docs to start building applications today-http://p.sf.net/sfu/adobe-com___
DSpace-tech mailing list
DSpace-tech@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/dspace-tech