[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-28 Thread Alexandre Rafalovitch (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874944#comment-16874944
 ] 

Alexandre Rafalovitch commented on SOLR-13571:
--

We could definitely do a sitemap.

But also, we could update the redirect list and see if that makes a lot of 
difference. I had a quick look in the infra repo and it seems to be two files: 
(solr_id_to_new.map.txt and solr_name_to_new.map.txt). This seems to correspond 
to those we generated in SOLR-10595. So perhaps we just need to review those 
files for target file name changes (may be 99% same) and ask Infra to refresh 
files with new URL base of 8.1. 

Also, if we could get access to the Google Webmaster tools, that would be nice. 
It can be done by publishing a file to the server, can we do that outside of a 
full publication process.

Finally, if we republish 6.6 with additional canonical header pointing to 
latest (or 8.1 or whatever), this may also refocus the search ranking. The work 
for that would probably be identical to that required to redo the maps. 


> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-28 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874861#comment-16874861
 ] 

Jan Høydahl commented on SOLR-13571:


[~sokolov] a site map.xml is also an interesting idea, perhaps the easiest to 
try first, i.e. publish a sitemap with tons of weight to the 8_1 guide and 
decreasing weight the older you get. Or only mention the newest? If it plays 
out well then that's all we need.

> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-27 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874548#comment-16874548
 ] 

Jan Høydahl commented on SOLR-13571:


{quote}I'm not totally against the idea of having a "latest", but I don't quite 
get why it can't be a redirect?
{quote}
Today the "latest" redirect hack is not a landing page of its own, and it uses 
302 redirect which I believe will not pass on page rank to the target.

Take [https://lucene.apache.org/solr/guide/about-this-guide.html] which now 
redirects to [https://lucene.apache.org/solr/guide/8_1/about-this-guide.html]. 
Now the user will start sharing the 8_1 link and in a few years we have the 
same issue that the 8_1 guide has a lot of credit. Since the URL in browser 
changes, it is hard to bookmark and copy, so it won't get much use in the wild.

If, on the other hand, we had a 
[https://lucene.apache.org/solr/guide/latest/about-this-guide.html] landing 
page, we could move the cwiki 301 redirect 
([https://cwiki.apache.org/confluence/display/solr/About+This+Guide)] to the 
new stable location. I'm not sure though if Google already has moved all the 
rank points to the 6_6 HTML url or if moving the redirects again will suddenly 
make the /latest/ urls rank high. If the 6_6 guide still has all the points we 
could of course redirect all 6_6 links to "latest" as well, but then the 6_6 
guide would be unreachable :). To fix that we could re-release the 6_6 guide 
under e.g. 6_6_0.

The extra effort if we choose such a model is
 * Copy the generated guide twice to release repo, to two different locations
 * Make sure page renames are handled, e.g. as I proposed above, to track when 
a page that existed before no longer exists in the to-be-published guide, and 
then add a redirect for it to the latest version that had that page, or add a 
dummy page with a link on it. This would be scripted as part of release process 
- make a tool comparing the page tree between two versions.

> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-27 Thread Mike Sokolov (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874290#comment-16874290
 ] 

Mike Sokolov commented on SOLR-13571:
-

Have we ever tried publishing a site map? Google used to have a feature that 
would read an XL file that described all the pages on the sure as a hint to its 
crawler. Also I wonder if we have ever checked out Google webmaster tools for 
the documentation site(s). 

> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-27 Thread Alexandre Rafalovitch (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874253#comment-16874253
 ] 

Alexandre Rafalovitch commented on SOLR-13571:
--

I guess, one place to start thinking this through is on how important it is 
that users find the reference manual. As a reference, Stack Overflow (and rest 
of the network) have more focus on being discovered by Google than on their 
internal engines. Obviously, they have too, as that's where money and attention 
is. But it is still an interesting explicit goal post.

For us, if the users cannot find a relevant reference guide page quickly, they 
may
* think a particular feature does not exist
* join and ask on the User Mailing list
* discover the reference guide in general and browse through it
* discover the reference guide and use our - still limited - internal search

None of the options above seem optimal compared to leveraging the public search 
engine. But then, we have to worry about SEO. Clearly, the current SEO works 
well enough to get us to the 6.6 version of the guide and - very importantly - 
to a somewhat relevant page. Switching that to be a single target page would be 
easier for us, but may cost a lot of SEO. And, frankly, I am not at all sure 
that our guide is SEO-friendly enough on its own. I just did a search for 
MappingCharFilterFactory (as an example) and 6.6 RefGuide is at the top 
followed by (old) Javadoc, (old) Wiki, two source-code class links and then 
random websites and blogs. Latest version link just does not seem to appear in 
the first couple of pages (though 7.x clone of the RefGuide on some Chinese 
community site does).

I suspect that Google is detecting multiple guide versions as duplicate content 
and therefore only displays one version and the 6.6 version has more weight due 
to redirects. But if we remove/collapse that link, I am not sure if the 
correct/latest version of the manual will be picked up. This feels risky to me. 

I don't know what the optimal solution is, given the limited resources 
available for this part of the project. I am just really worried that lost 
Google ranking is hard to get back. Perhaps, as a minimum step, we could just 
refresh the URL map periodically to use whatever latest version is.

> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-27 Thread Cassandra Targett (JIRA)


[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16874198#comment-16874198
 ] 

Cassandra Targett commented on SOLR-13571:
--

I'm not totally against the idea of having a "latest", but I don't quite get 
why it can't be a redirect? My gut reaction is that it further complicates the 
release process, and since I'm pretty much the only one who ever does it (with 
one recent exception), I'd like to be very sure that additional steps are 
necessary. I'd be more likely to get on board if you were able to spell out the 
specific changes to the release process that this would cause.

Maybe it would be simpler to ask Infra to just change that big list of 
redirects to go to one single page that says "You have a link to the old 
version of the Ref Guide, here's where the latest versions are." Or just have 
it go to https://lucene.apache.org/solr/guide/. I mean, it's the internet - 
stuff moves and life pretty much goes on.

Related to that idea, we need to institute a proper 404 page and redirect rule 
for it.

There are also a large number of duplicated files in each release - CSS, fonts, 
images. I have been recently thinking I'd like to restructure everything so we 
stop uploading things that are very unlikely to change from release-to-release, 
but that is way beyond the scope here, and I don't have any concrete ideas 
there yet.

I think it's worth asking if the value we'd get here is worth the effort of 
more steps to the process and more duplication of content. It's been 3 years 
since we moved. I agree that having the 6.6 Guide rank highest is not good. But 
perhaps we can fix that in a simpler way?


> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org



[jira] [Commented] (SOLR-13571) Make recent RefGuide rank well in Google

2019-06-24 Thread JIRA


[ 
https://issues.apache.org/jira/browse/SOLR-13571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16870915#comment-16870915
 ] 

Jan Høydahl commented on SOLR-13571:


Ideally I'd like us to have a real copy of the refGuide at 
[https://lucene.apache.org/solr/guide/latest/] instead of today a redirect that 
will rewrite the URL to latest if you omit the X_Y part. It would still be 
possible to permalink to a specific version of the guide, but if e.g. 
[https://lucene.apache.org/solr/guide/latest/solrcloud.html] would contain to 
8_1 guide right now, and then once 8_2 is released, we publish it to both the 
"8_2" subfolder and the "latest" subfolder, and the rank authority of that 
"latest" URL would then remain, and over time hopefully grow strong? It would 
also make it way easier for people to link to the latest version if they do not 
care about version.

We'd obviously need to sort out how to handle URL renames and deletions. Part 
of the release process could perhaps be to generate a list of all pages in 
existing "latest" and new guide to be released, and for every page that existed 
in X_Y but not in newest X_Z, we'd add a redirect rule to X_Y for that specific 
page, to make sure we don't break too many links on the "latest" guide.

[~arafalov], [~ctargett]

> Make recent RefGuide rank well in Google
> 
>
> Key: SOLR-13571
> URL: https://issues.apache.org/jira/browse/SOLR-13571
> Project: Solr
>  Issue Type: Improvement
>  Security Level: Public(Default Security Level. Issues are Public) 
>  Components: documentation
>Reporter: Jan Høydahl
>Priority: Major
>
> Spinoff from SOLR-13548
> The old Confluence ref-guide has a lot of pages pointing to it, and all of 
> that link karma is delegated to the {{/solr/guide/6_6/}} html ref guide, 
> making it often rank top. However we'd want newer content to rank high. See 
> these comments for some first ideas.



--
This message was sent by Atlassian JIRA
(v7.6.3#76005)

-
To unsubscribe, e-mail: dev-unsubscr...@lucene.apache.org
For additional commands, e-mail: dev-h...@lucene.apache.org