Strict Search in Apache Solr
How could Solr accomplish an end-user behavior like a strict search? Let’s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I’d like to query: “your future”, then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Strict Search in Apache Solr
Okay, let¹s try it this wayŠ CURRENTLY: Step 1: Type, your future into the search bar. Step 2: 10 search results return. I¹D LIKE TO SEE THIS: Step 1: Type, ³your future² into the search bar. Step 2: 1 search result returns. Can this be accomplished through the Solr UI? Thanks, Mark On 5/5/14, 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote: Hi Reyes, I think it is not clear your question. Please see : https://wiki.apache.org/solr/UsingMailingLists Ahmet On Tuesday, May 6, 2014 12:23 AM, Reyes, Mark mark.re...@bpiedu.com wrote: How could Solr accomplish an end-user behavior like a strict search? Let¹s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I¹d like to query: ³your future², then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Indexing URLs for Binaries
Check suffix-urlfilter.txt in your conf directory for Nutch. You might be prohibiting those filetypes from the crawl. - Mark On 1/3/14, 10:29 AM, Teague James teag...@insystechinc.com wrote: I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to binary files, such as Word, PDF, etc. The crawler crawls the site but I am not getting the URLs of the links for the binary files no matter how deep I set the settings for the site. I see the labels for the links in the content, but not the URLs. Any ideas on how I could get those URLs back in my crawl? IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Proxy.php tutorials for AJAX Solr
Are there any good tutorials that touch base on how to integrate the suggested PHP proxy for JavaScript framework AJAX Solr? Here is the proxy, https://gist.github.com/evolvingweb/298580 Also on Stackoverflow, http://stackoverflow.com/questions/20338073/proxy-php-tutorials-for-ajax-solr IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Client-side proxy for Solr 4.5.0
What about using some JSONP techniques since the results in the Solr instance rest as key/value pairs? On 11/26/13, 10:53 AM, Markus Jelsma markus.jel...@openindex.io wrote: I don't think you mean client-side proxy. You need a server side layer such as a normal web application or good proxy. We use Nginx, it is very fast and very feature rich. Its config scripting is usually enough to restrict access and limit input parameters. We also use Nginx's embedded Perl and Lua scripting besides its config scripting to implement more difficult logic. -Original message- From:Reyes, Mark mark.re...@bpiedu.com Sent: Tuesday 26th November 2013 19:27 To: solr-user@lucene.apache.org Subject: Client-side proxy for Solr 4.5.0 Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that the end-user can see their queries w/o being able to directly access :8983? Applications/frameworks used: - Solr 4.5.0 - AJAX Solr (javascript library) Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Client-side proxy for Solr 4.5.0
Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that the end-user can see their queries w/o being able to directly access :8983? Applications/frameworks used: - Solr 4.5.0 - AJAX Solr (javascript library) Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Indexing data to a specific collection in Solr 4.5.0
Hi all: I’m currently on a Solr 4.5.0 instance and running this tutorial, http://lucene.apache.org/solr/4_5_0/tutorial.html My question is specific to indexing data as proposed from this tutorial, $ java -jar post.jar solr.xml monitor.xml The tutorial advises to validate from your localhost, http://localhost:8983/solr/collection1/select?q=solrwt=xml However, what if my Solr core has both a collection1 and collection2, yet I desire the XML files to only be posted to collection2 only? If possible, please advise. Thanks, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Indexing data to a specific collection in Solr 4.5.0
So then, $ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update solr.xml monitor.xml On 11/21/13, 8:14 AM, xiezhide xiezh...@gmail.com wrote: add Durl=http://localhost:8983/solr/collection2/update when run post.jar, 此邮件发送自189邮箱 Reyes, Mark mark.re...@bpiedu.com wrote: Hi all: I’m currently on a Solr 4.5.0 instance and running this tutorial, http://lucene.apache.org/solr/4_5_0/tutorial.html My question is specific to indexing data as proposed from this tutorial, $ java -jar post.jar solr.xml monitor.xml The tutorial advises to validate from your localhost, http://localhost:8983/solr/collection1/select?q=solrwt=xml However, what if my Solr core has both a collection1 and collection2, yet I desire the XML files to only be posted to collection2 only? If possible, please advise. Thanks, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Nutch 1.7 solrdedup error
When trying to delete duplicates after crawl I get the following, http://pastebin.com/aQbqmPLm When running this command on terminal: $ bin/nutch solrdedup http://localhost:8983/solr/rockies Here is my setup: - Nutch 1.7 - Solr 4.5.0 - java version 1.6.0_51 On Stackoverflow as well, http://stackoverflow.com/questions/20013630/nutch-1-7-solrdedup-error Thanks, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC
Hi: I was encouraged to explore the Solr mail list, specifically regarding the fl–parameter. What is that parameter for and can it accomplish my original task of crawling/indexing specific html components versus parsing the entire page? My original question is listed below (previously on the Nutch mail list): --- I’m using Nutch 1.7 to crawl/index the pages of my domain to Solr and JavaScript library AJAX Solr to capture that index as JSON, which would then print that to the front-end. My question is, if it’s possible to have specific content return (i.e. An H2 tag and a p tag) on the search results page versus all contents of that page? --- Thanks again, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Setting up Multiple Cores on Solr 4.5.0
Any good/recent documentation that I can reference on setting up multiple cores in Solr 4.5.0? Thanks all, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Exclude urls without 'www' from Nutch 1.7 crawl
I'm currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs being indexed as www vs. non-www. Specifically, after firing the crawl and index to Solr 4.5 then validating the results on the front-end with AJAX Solr, the search results page lists results/pages that are both 'www' and '' urls such as: www.mywebsite.com mywebsite.com www.mywebsite.com/page1 mywebsite.com/page1 My understanding is that the url filtering (regex-urlfilter.txt) needs modification. Are there any regex/nutch experts that could suggest a solution? Here is the code on paste bin, http://pastebin.com/Cp6vUxPR Also on stack overflow, http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from-nutch-1-7-crawl Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Exclude urls without 'www' from Nutch 1.7 crawl
Noted and will do (that goes twice for the suggestions and putting this on the nutch list instead). Thanks all, Mark On 11/1/13, 10:53 AM, Furkan KAMACI furkankam...@gmail.com wrote: As Markus pointed Nutch has a feature for such kind of situation. Here is Solr list but one more thing for you: www.mywebsite.com and mywebsite.commay point to different pages. 2013/11/1 Markus Jelsma markus.jel...@openindex.io Hi - Use the domain-urlfilter for host, domain and TLD filtering. Also, please ask questions on the Nutch list, you're on Solr now :) -Original message- From:Reyes, Mark mark.re...@bpiedu.com Sent: Friday 1st November 2013 17:24 To: solr-user@lucene.apache.org Subject: Exclude urls without 'www' from Nutch 1.7 crawl I'm currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs being indexed as www vs. non-www. Specifically, after firing the crawl and index to Solr 4.5 then validating the results on the front-end with AJAX Solr, the search results page lists results/pages that are both 'www' and '' urls such as: www.mywebsite.com mywebsite.com www.mywebsite.com/page1 mywebsite.com/page1 My understanding is that the url filtering (regex-urlfilter.txt) needs modification. Are there any regex/nutch experts that could suggest a solution? Here is the code on paste bin, http://pastebin.com/Cp6vUxPR Also on stack overflow, http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from -nutch-1-7-crawl Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: AJAX Solr returning the default wildcard *:* and not what I query
I just tweaked the reuters.js example to listen to the window.location object and it resolved the wildcard returns. I put it on pastebin, http://pastebin.com/GyC4RMva Thanks for the reply everyone, Mark --- P. 866.475.0317 x 3244 Bridgepoint Education INNOVATIVE SOLUTIONS THAT ADVANCE LEARNING SM On 10/31/13, 12:23 AM, Raymond Wiker rwi...@gmail.com wrote: The parameters indicate a jQuery.ajax call with result type jsonp - a script tag is inserted into the web page, where the script url contains the actual query parameters. This should be pretty painless to debug in Google Chrome and Safari, at least - these two browsers have pretty neat debug/inspection capabilities. On Wed, Oct 30, 2013 at 9:07 PM, Anshum Gupta ans...@anshumgupta.netwrote: As Shawn pointed out, seems like your client is actually sending out *:* queries all of the times. You perhaps have the wrong id for the search box or something that results in your ajax library to never actually receive the actual input value, but I'm just guessing. On Thu, Oct 31, 2013 at 1:25 AM, Reyes, Mark mark.re...@bpiedu.com wrote: solr.log file per Solr 4.5 http://pastebin.com/zSpERJZA Thanks Shawn, Mark On 10/30/13, 12:44 PM, Shawn Heisey s...@elyograg.org wrote: On 10/30/2013 1:26 PM, Reyes, Mark wrote: I am currently integrating JavaScript framework AJAX Solr to my domain. I am trying to query words such as 'doctorate' or 'programs' but the console is reporting '*:*' only the default wildcard. Just curious if anyone has any helpful hints? The problem can be seen in detail on Stackoverflow, http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-defau lt-wildcard-and-not-what-i-query We would have to know what Solr is actually receiving from your app. The Solr log should have an entry for every query you do, and it includes all of the parameters for that quey. This is *not* the Logging tab in the admin UI, but the actual logfile. On Solr 4.3 and later with the example logging setup, this is typically $CWD/logs/solr.log. Thanks, Shawn IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments. -- Anshum Gupta http://www.anshumgupta.net IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
AJAX Solr returning the default wildcard *:* and not what I query
I am currently integrating JavaScript framework AJAX Solr to my domain. I am trying to query words such as 'doctorate' or 'programs' but the console is reporting '*:*' only the default wildcard. Just curious if anyone has any helpful hints? The problem can be seen in detail on Stackoverflow, http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-default-wildcard-and-not-what-i-query Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: AJAX Solr returning the default wildcard *:* and not what I query
solr.log file per Solr 4.5 http://pastebin.com/zSpERJZA Thanks Shawn, Mark On 10/30/13, 12:44 PM, Shawn Heisey s...@elyograg.org wrote: On 10/30/2013 1:26 PM, Reyes, Mark wrote: I am currently integrating JavaScript framework AJAX Solr to my domain. I am trying to query words such as 'doctorate' or 'programs' but the console is reporting '*:*' only the default wildcard. Just curious if anyone has any helpful hints? The problem can be seen in detail on Stackoverflow, http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-defau lt-wildcard-and-not-what-i-query We would have to know what Solr is actually receiving from your app. The Solr log should have an entry for every query you do, and it includes all of the parameters for that quey. This is *not* the Logging tab in the admin UI, but the actual logfile. On Solr 4.3 and later with the example logging setup, this is typically $CWD/logs/solr.log. Thanks, Shawn IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.