Re: Strict Search in Apache Solr
Okay, let¹s try it this wayŠ CURRENTLY: Step 1: Type, your future into the search bar. Step 2: 10 search results return. I¹D LIKE TO SEE THIS: Step 1: Type, ³your future² into the search bar. Step 2: 1 search result returns. Can this be accomplished through the Solr UI? Thanks, Mark On 5/5/14, 3:17 PM, "Ahmet Arslan" wrote: >Hi Reyes, > >I think it is not clear your question. >Please see : https://wiki.apache.org/solr/UsingMailingLists > >Ahmet > >On Tuesday, May 6, 2014 12:23 AM, "Reyes, Mark" >wrote: >How could Solr accomplish an end-user behavior like a strict search? > >Let¹s say an end-user decides to use quotation marks in their keywords to >provide specificity in their search results. > >Current: >If you were to query: your future, then 10 results would return and print >to the page. > >Expected: >I¹d like to query: ³your future², then less than 10 results would return >and print to the page. > >Regards, >Mark > >IMPORTANT NOTICE: This e-mail message is intended to be received only by >persons entitled to receive the confidential information it may contain. >E-mail messages sent from Bridgepoint Education may contain information >that is confidential and may be legally privileged. Please do not read, >copy, forward or store this message unless you are an intended recipient >of it. If you received this transmission in error, please notify the >sender by reply e-mail and delete the message and any attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Strict Search in Apache Solr
How could Solr accomplish an end-user behavior like a strict search? Let’s say an end-user decides to use quotation marks in their keywords to provide specificity in their search results. Current: If you were to query: your future, then 10 results would return and print to the page. Expected: I’d like to query: “your future”, then less than 10 results would return and print to the page. Regards, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Indexing URLs for Binaries
Check suffix-urlfilter.txt in your conf directory for Nutch. You might be prohibiting those filetypes from the crawl. - Mark On 1/3/14, 10:29 AM, "Teague James" wrote: >I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to >binary files, such as Word, PDF, etc. The crawler crawls the site but I am >not getting the URLs of the links for the binary files no matter how deep >I >set the settings for the site. I see the labels for the links in the >content, but not the URLs. Any ideas on how I could get those URLs back in >my crawl? > IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Proxy.php tutorials for AJAX Solr
Are there any good tutorials that touch base on how to integrate the suggested PHP proxy for JavaScript framework AJAX Solr? Here is the proxy, https://gist.github.com/evolvingweb/298580 Also on Stackoverflow, http://stackoverflow.com/questions/20338073/proxy-php-tutorials-for-ajax-solr IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Client-side proxy for Solr 4.5.0
What about using some JSONP techniques since the results in the Solr instance rest as key/value pairs? On 11/26/13, 10:53 AM, "Markus Jelsma" wrote: >I don't think you mean client-side proxy. You need a server side layer >such as a normal web application or good proxy. We use Nginx, it is very >fast and very feature rich. Its config scripting is usually enough to >restrict access and limit input parameters. We also use Nginx's embedded >Perl and Lua scripting besides its config scripting to implement more >difficult logic. > > > >-Original message- >> From:Reyes, Mark >> Sent: Tuesday 26th November 2013 19:27 >> To: solr-user@lucene.apache.org >> Subject: Client-side proxy for Solr 4.5.0 >> >> Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance >>so that the end-user can see their queries w/o being able to directly >>access :8983? >> >> Applications/frameworks used: >> - Solr 4.5.0 >> - AJAX Solr (javascript library) >> >> Thank you, >> Mark >> >> IMPORTANT NOTICE: This e-mail message is intended to be received only >>by persons entitled to receive the confidential information it may >>contain. E-mail messages sent from Bridgepoint Education may contain >>information that is confidential and may be legally privileged. Please >>do not read, copy, forward or store this message unless you are an >>intended recipient of it. If you received this transmission in error, >>please notify the sender by reply e-mail and delete the message and any >>attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Client-side proxy for Solr 4.5.0
Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that the end-user can see their queries w/o being able to directly access :8983? Applications/frameworks used: - Solr 4.5.0 - AJAX Solr (javascript library) Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Indexing data to a specific collection in Solr 4.5.0
Hi all: I’m currently on a Solr 4.5.0 instance and running this tutorial, http://lucene.apache.org/solr/4_5_0/tutorial.html My question is specific to indexing data as proposed from this tutorial, $ java -jar post.jar solr.xml monitor.xml The tutorial advises to validate from your localhost, http://localhost:8983/solr/collection1/select?q=solr&wt=xml However, what if my Solr core has both a collection1 and collection2, yet I desire the XML files to only be posted to collection2 only? If possible, please advise. Thanks, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Indexing data to a specific collection in Solr 4.5.0
So then, $ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update solr.xml monitor.xml On 11/21/13, 8:14 AM, "xiezhide" wrote: > >add Durl=http://localhost:8983/solr/collection2/update when run post.jar, >此邮件发送自189邮箱 > >"Reyes, Mark" wrote: > >>Hi all: >> >>I’m currently on a Solr 4.5.0 instance and running this tutorial, >>http://lucene.apache.org/solr/4_5_0/tutorial.html >> >>My question is specific to indexing data as proposed from this tutorial, >> >>$ java -jar post.jar solr.xml monitor.xml >> >>The tutorial advises to validate from your localhost, >>http://localhost:8983/solr/collection1/select?q=solr&wt=xml >> >>However, what if my Solr core has both a collection1 and collection2, >>yet I desire the XML files to only be posted to collection2 only? >> >>If possible, please advise. >> >>Thanks, >>Mark >> >>IMPORTANT NOTICE: This e-mail message is intended to be received only by >>persons entitled to receive the confidential information it may contain. >>E-mail messages sent from Bridgepoint Education may contain information >>that is confidential and may be legally privileged. Please do not read, >>copy, forward or store this message unless you are an intended recipient >>of it. If you received this transmission in error, please notify the >>sender by reply e-mail and delete the message and any attachments. IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Nutch 1.7 solrdedup error
When trying to delete duplicates after crawl I get the following, http://pastebin.com/aQbqmPLm When running this command on terminal: $ bin/nutch solrdedup http://localhost:8983/solr/rockies Here is my setup: - Nutch 1.7 - Solr 4.5.0 - java version "1.6.0_51" On Stackoverflow as well, http://stackoverflow.com/questions/20013630/nutch-1-7-solrdedup-error Thanks, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC
Hi: I was encouraged to explore the Solr mail list, specifically regarding the fl–parameter. What is that parameter for and can it accomplish my original task of crawling/indexing specific html components versus parsing the entire page? My original question is listed below (previously on the Nutch mail list): --- I’m using Nutch 1.7 to crawl/index the pages of my domain to Solr and JavaScript library AJAX Solr to capture that index as JSON, which would then print that to the front-end. My question is, if it’s possible to have specific content return (i.e. An H2 tag and a p tag) on the search results page versus all contents of that page? --- Thanks again, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Setting up Multiple Cores on Solr 4.5.0
Any good/recent documentation that I can reference on setting up multiple cores in Solr 4.5.0? Thanks all, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: Exclude urls without 'www' from Nutch 1.7 crawl
Noted and will do (that goes twice for the suggestions and putting this on the nutch list instead). Thanks all, Mark On 11/1/13, 10:53 AM, "Furkan KAMACI" wrote: >As Markus pointed Nutch has a feature for such kind of situation. Here is >Solr list but one more thing for you: www.mywebsite.com and >mywebsite.commay point to "different" pages. > > >2013/11/1 Markus Jelsma > >> Hi - Use the domain-urlfilter for host, domain and TLD filtering. >> >> Also, please ask questions on the Nutch list, you're on Solr now :) >> >> >> -Original message- >> > From:Reyes, Mark >> > Sent: Friday 1st November 2013 17:24 >> > To: solr-user@lucene.apache.org >> > Subject: Exclude urls without 'www' from Nutch 1.7 crawl >> > >> > I'm currently using Nutch 1.7 to crawl my domain. My issue is specific >> to URLs being indexed as www vs. non-www. >> > >> > Specifically, after firing the crawl and index to Solr 4.5 then >> validating the results on the front-end with AJAX Solr, the search >>results >> page lists results/pages that are both 'www' and '' urls such as: >> > >> > www.mywebsite.com >> > mywebsite.com >> > www.mywebsite.com/page1 >> > mywebsite.com/page1 >> > >> > My understanding is that the url filtering (regex-urlfilter.txt) needs >> modification. Are there any regex/nutch experts that could suggest a >> solution? >> > >> > Here is the code on paste bin, >> > http://pastebin.com/Cp6vUxPR >> > >> > Also on stack overflow, >> > >> >>http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from >>-nutch-1-7-crawl >> > >> > Thank you, >> > Mark >> > >> > >> > IMPORTANT NOTICE: This e-mail message is intended to be received only >>by >> persons entitled to receive the confidential information it may contain. >> E-mail messages sent from Bridgepoint Education may contain information >> that is confidential and may be legally privileged. Please do not read, >> copy, forward or store this message unless you are an intended >>recipient of >> it. If you received this transmission in error, please notify the >>sender by >> reply e-mail and delete the message and any attachments. >> IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Exclude urls without 'www' from Nutch 1.7 crawl
I'm currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs being indexed as www vs. non-www. Specifically, after firing the crawl and index to Solr 4.5 then validating the results on the front-end with AJAX Solr, the search results page lists results/pages that are both 'www' and '' urls such as: www.mywebsite.com mywebsite.com www.mywebsite.com/page1 mywebsite.com/page1 My understanding is that the url filtering (regex-urlfilter.txt) needs modification. Are there any regex/nutch experts that could suggest a solution? Here is the code on paste bin, http://pastebin.com/Cp6vUxPR Also on stack overflow, http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from-nutch-1-7-crawl Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
Re: AJAX Solr returning the default wildcard *:* and not what I query
I just tweaked the reuters.js example to listen to the window.location object and it resolved the wildcard returns. I put it on pastebin, http://pastebin.com/GyC4RMva Thanks for the reply everyone, Mark --- P. 866.475.0317 x 3244 Bridgepoint Education INNOVATIVE SOLUTIONS THAT ADVANCE LEARNING SM On 10/31/13, 12:23 AM, "Raymond Wiker" wrote: >The parameters indicate a jQuery.ajax call with result type "jsonp" - a >
Re: AJAX Solr returning the default wildcard *:* and not what I query
solr.log file per Solr 4.5 http://pastebin.com/zSpERJZA Thanks Shawn, Mark On 10/30/13, 12:44 PM, "Shawn Heisey" wrote: >On 10/30/2013 1:26 PM, Reyes, Mark wrote: >> I am currently integrating JavaScript framework AJAX Solr to my domain. >>I am trying to query words such as 'doctorate' or 'programs' but the >>console is reporting '*:*' only the default wildcard. >> >> Just curious if anyone has any helpful hints? The problem can be seen >>in detail on Stackoverflow, >> >>http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-defau >>lt-wildcard-and-not-what-i-query > >We would have to know what Solr is actually receiving from your app. The >Solr log should have an entry for every query you do, and it includes >all of the parameters for that quey. This is *not* the Logging tab in >the admin UI, but the actual logfile. On Solr 4.3 and later with the >example logging setup, this is typically $CWD/logs/solr.log. > >Thanks, >Shawn > IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.
AJAX Solr returning the default wildcard *:* and not what I query
I am currently integrating JavaScript framework AJAX Solr to my domain. I am trying to query words such as 'doctorate' or 'programs' but the console is reporting '*:*' only the default wildcard. Just curious if anyone has any helpful hints? The problem can be seen in detail on Stackoverflow, http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-default-wildcard-and-not-what-i-query Thank you, Mark IMPORTANT NOTICE: This e-mail message is intended to be received only by persons entitled to receive the confidential information it may contain. E-mail messages sent from Bridgepoint Education may contain information that is confidential and may be legally privileged. Please do not read, copy, forward or store this message unless you are an intended recipient of it. If you received this transmission in error, please notify the sender by reply e-mail and delete the message and any attachments.