Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print to 
the page.

Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
Okay, let¹s try it this wayŠ

CURRENTLY:
Step 1: Type, your future into the search bar.
Step 2: 10 search results return.

I¹D LIKE TO SEE THIS:
Step 1: Type, ³your future² into the search bar.
Step 2: 1 search result returns.

Can this be accomplished through the Solr UI?

Thanks,

Mark

On 5/5/14, 3:17 PM, Ahmet Arslan iori...@yahoo.com wrote:

Hi Reyes,

I think it is not clear your question.
Please see : https://wiki.apache.org/solr/UsingMailingLists

Ahmet

On Tuesday, May 6, 2014 12:23 AM, Reyes, Mark mark.re...@bpiedu.com
wrote:
How could Solr accomplish an end-user behavior like a strict search?

Let¹s say an end-user decides to use quotation marks in their keywords to
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print
to the page.

Expected:
I¹d like to query: ³your future², then less than 10 results would return
and print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by
persons entitled to receive the confidential information it may contain.
E-mail messages sent from Bridgepoint Education may contain information
that is confidential and may be legally privileged. Please do not read,
copy, forward or store this message unless you are an intended recipient
of it. If you received this transmission in error, please notify the
sender by reply e-mail and delete the message and any attachments. 


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Indexing URLs for Binaries

2014-01-03 Thread Reyes, Mark
Check suffix-urlfilter.txt in your conf directory for Nutch. You might be
prohibiting those filetypes from the crawl.

- Mark






On 1/3/14, 10:29 AM, Teague James teag...@insystechinc.com wrote:

I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to
binary files, such as Word, PDF, etc. The crawler crawls the site but I am
not getting the URLs of the links for the binary files no matter how deep
I
set the settings for the site. I see the labels for the links in the
content, but not the URLs. Any ideas on how I could get those URLs back in
my crawl?



IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Proxy.php tutorials for AJAX Solr

2013-12-02 Thread Reyes, Mark
Are there any good tutorials that touch base on how to integrate the suggested 
PHP proxy for JavaScript framework AJAX Solr?

Here is the proxy, https://gist.github.com/evolvingweb/298580

Also on Stackoverflow, 
http://stackoverflow.com/questions/20338073/proxy-php-tutorials-for-ajax-solr

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Client-side proxy for Solr 4.5.0

2013-11-27 Thread Reyes, Mark
What about using some JSONP techniques since the results in the Solr
instance rest as key/value pairs?


On 11/26/13, 10:53 AM, Markus Jelsma markus.jel...@openindex.io wrote:

I don't think you mean client-side proxy. You need a server side layer
such as a normal web application or good proxy. We use Nginx, it is very
fast and very feature rich. Its config scripting is usually enough to
restrict access and limit input parameters. We also use Nginx's embedded
Perl and Lua scripting besides its config scripting to implement more
difficult logic.

 
 
-Original message-
 From:Reyes, Mark mark.re...@bpiedu.com
 Sent: Tuesday 26th November 2013 19:27
 To: solr-user@lucene.apache.org
 Subject: Client-side proxy for Solr 4.5.0
 
 Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance
so that the end-user can see  their queries w/o being able to directly
access :8983?
 
 Applications/frameworks used:
 - Solr 4.5.0
 - AJAX Solr (javascript library)
 
 Thank you,
 Mark
 
 IMPORTANT NOTICE: This e-mail message is intended to be received only
by persons entitled to receive the confidential information it may
contain. E-mail messages sent from Bridgepoint Education may contain
information that is confidential and may be legally privileged. Please
do not read, copy, forward or store this message unless you are an
intended recipient of it. If you received this transmission in error,
please notify the sender by reply e-mail and delete the message and any
attachments.


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Client-side proxy for Solr 4.5.0

2013-11-26 Thread Reyes, Mark
Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that 
the end-user can see  their queries w/o being able to directly access :8983?

Applications/frameworks used:
- Solr 4.5.0
- AJAX Solr (javascript library)

Thank you,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread Reyes, Mark
Hi all:

I’m currently on a Solr 4.5.0 instance and running this tutorial, 
http://lucene.apache.org/solr/4_5_0/tutorial.html

My question is specific to indexing data as proposed from this tutorial,

$ java -jar post.jar solr.xml monitor.xml

The tutorial advises to validate from your localhost,
http://localhost:8983/solr/collection1/select?q=solrwt=xml

However, what if my Solr core has both a collection1 and collection2, yet I 
desire the XML files to only be posted to collection2 only?

If possible, please advise.

Thanks,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread Reyes, Mark
So then,
$ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update
solr.xml monitor.xml





On 11/21/13, 8:14 AM, xiezhide xiezh...@gmail.com wrote:


add Durl=http://localhost:8983/solr/collection2/update when run post.jar,
此邮件发送自189邮箱

Reyes, Mark mark.re...@bpiedu.com wrote:

Hi all:

I’m currently on a Solr 4.5.0 instance and running this tutorial,
http://lucene.apache.org/solr/4_5_0/tutorial.html

My question is specific to indexing data as proposed from this tutorial,

$ java -jar post.jar solr.xml monitor.xml

The tutorial advises to validate from your localhost,
http://localhost:8983/solr/collection1/select?q=solrwt=xml

However, what if my Solr core has both a collection1 and collection2,
yet I desire the XML files to only be posted to collection2 only?

If possible, please advise.

Thanks,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by
persons entitled to receive the confidential information it may contain.
E-mail messages sent from Bridgepoint Education may contain information
that is confidential and may be legally privileged. Please do not read,
copy, forward or store this message unless you are an intended recipient
of it. If you received this transmission in error, please notify the
sender by reply e-mail and delete the message and any attachments.


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Nutch 1.7 solrdedup error

2013-11-17 Thread Reyes, Mark
When trying to delete duplicates after crawl I get the following,
http://pastebin.com/aQbqmPLm

When running this command on terminal:


$ bin/nutch solrdedup http://localhost:8983/solr/rockies

Here is my setup:
- Nutch 1.7
- Solr 4.5.0
- java version 1.6.0_51

On Stackoverflow as well,
http://stackoverflow.com/questions/20013630/nutch-1-7-solrdedup-error

Thanks,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC

2013-11-11 Thread Reyes, Mark
Hi:

I was encouraged to explore the Solr mail list, specifically regarding the 
fl–parameter.  What is that parameter for and can it accomplish my original 
task of crawling/indexing specific html components versus parsing the entire 
page?

My original question is listed below (previously on the Nutch mail list):

---
I’m using Nutch 1.7 to crawl/index the pages of my domain to Solr and 
JavaScript library AJAX Solr to capture that index as JSON, which would then 
print that to the front-end.
My question is, if it’s possible to have specific content return (i.e. An H2 
tag and a p tag) on the search results page versus all contents of that page?
---

Thanks again,
Mark





IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Setting up Multiple Cores on Solr 4.5.0

2013-11-10 Thread Reyes, Mark
Any good/recent documentation that I can reference on setting up multiple cores 
in Solr 4.5.0?

Thanks all,
Mark


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Reyes, Mark
I'm currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs 
being indexed as www vs. non-www.

Specifically, after firing the crawl and index to Solr 4.5 then validating the 
results on the front-end with AJAX Solr, the search results page lists 
results/pages that are both 'www' and '' urls such as:

www.mywebsite.com
mywebsite.com
www.mywebsite.com/page1
mywebsite.com/page1

My understanding is that the url filtering (regex-urlfilter.txt) needs 
modification. Are there any regex/nutch experts that could suggest a solution?

Here is the code on paste bin,
http://pastebin.com/Cp6vUxPR

Also on stack overflow,
http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from-nutch-1-7-crawl

Thank you,
Mark


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Reyes, Mark
Noted and will do (that goes twice for the suggestions and putting this on
the nutch list instead).

Thanks all,
Mark



On 11/1/13, 10:53 AM, Furkan KAMACI furkankam...@gmail.com wrote:

As Markus pointed Nutch has a feature for such kind of situation. Here is
Solr list but one more thing for you: www.mywebsite.com and
mywebsite.commay point to different pages.


2013/11/1 Markus Jelsma markus.jel...@openindex.io

 Hi - Use the domain-urlfilter for host, domain and TLD filtering.

 Also, please ask questions on the Nutch list, you're on Solr now :)


 -Original message-
  From:Reyes, Mark mark.re...@bpiedu.com
  Sent: Friday 1st November 2013 17:24
  To: solr-user@lucene.apache.org
  Subject: Exclude urls without 'www' from Nutch 1.7 crawl
 
  I'm currently using Nutch 1.7 to crawl my domain. My issue is specific
 to URLs being indexed as www vs. non-www.
 
  Specifically, after firing the crawl and index to Solr 4.5 then
 validating the results on the front-end with AJAX Solr, the search
results
 page lists results/pages that are both 'www' and '' urls such as:
 
  www.mywebsite.com
  mywebsite.com
  www.mywebsite.com/page1
  mywebsite.com/page1
 
  My understanding is that the url filtering (regex-urlfilter.txt) needs
 modification. Are there any regex/nutch experts that could suggest a
 solution?
 
  Here is the code on paste bin,
  http://pastebin.com/Cp6vUxPR
 
  Also on stack overflow,
 
 
http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from
-nutch-1-7-crawl
 
  Thank you,
  Mark
 
 
  IMPORTANT NOTICE: This e-mail message is intended to be received only
by
 persons entitled to receive the confidential information it may contain.
 E-mail messages sent from Bridgepoint Education may contain information
 that is confidential and may be legally privileged. Please do not read,
 copy, forward or store this message unless you are an intended
recipient of
 it. If you received this transmission in error, please notify the
sender by
 reply e-mail and delete the message and any attachments.



IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: AJAX Solr returning the default wildcard *:* and not what I query

2013-10-31 Thread Reyes, Mark
I just tweaked the reuters.js example to listen to the window.location
object and it resolved the wildcard returns.

I put it on pastebin,
http://pastebin.com/GyC4RMva


Thanks for the reply everyone,
Mark
---


P. 866.475.0317 x 3244
Bridgepoint Education
INNOVATIVE SOLUTIONS THAT ADVANCE LEARNING SM




On 10/31/13, 12:23 AM, Raymond Wiker rwi...@gmail.com wrote:

The parameters indicate a jQuery.ajax call with result type jsonp - a
script tag is inserted into the web page, where the script url contains
the actual query parameters. This should be pretty painless to debug in
Google Chrome and Safari, at least - these two browsers have pretty neat
debug/inspection capabilities.


On Wed, Oct 30, 2013 at 9:07 PM, Anshum Gupta
ans...@anshumgupta.netwrote:

 As Shawn pointed out, seems like your client is actually sending out
*:*
 queries all of the times.
 You perhaps have the wrong id for the search box or something that
results
 in your ajax library to never actually receive the actual input value,
but
 I'm just guessing.



 On Thu, Oct 31, 2013 at 1:25 AM, Reyes, Mark mark.re...@bpiedu.com
 wrote:

  solr.log file per Solr 4.5
 
  http://pastebin.com/zSpERJZA
 
 
  Thanks Shawn,
  Mark
 
 
 
  On 10/30/13, 12:44 PM, Shawn Heisey s...@elyograg.org wrote:
 
  On 10/30/2013 1:26 PM, Reyes, Mark wrote:
   I am currently integrating JavaScript framework AJAX Solr to my
 domain.
  I am trying to query words such as 'doctorate' or 'programs' but the
  console is reporting '*:*' only the default wildcard.
  
   Just curious if anyone has any helpful hints? The problem can be
seen
  in detail on Stackoverflow,
  
  
 
 
http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-defau
  lt-wildcard-and-not-what-i-query
  
  We would have to know what Solr is actually receiving from your app.
The
  Solr log should have an entry for every query you do, and it includes
  all of the parameters for that quey.  This is *not* the Logging tab
in
  the admin UI, but the actual logfile.  On Solr 4.3 and later with the
  example logging setup, this is typically $CWD/logs/solr.log.
  
  Thanks,
  Shawn
  
 
 
  IMPORTANT NOTICE: This e-mail message is intended to be received only
by
  persons entitled to receive the confidential information it may
contain.
  E-mail messages sent from Bridgepoint Education may contain
information
  that is confidential and may be legally privileged. Please do not
read,
  copy, forward or store this message unless you are an intended
recipient
 of
  it. If you received this transmission in error, please notify the
sender
 by
  reply e-mail and delete the message and any attachments.
 



 --

 Anshum Gupta
 http://www.anshumgupta.net



IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

AJAX Solr returning the default wildcard *:* and not what I query

2013-10-30 Thread Reyes, Mark
I am currently integrating JavaScript framework AJAX Solr to my domain. I am 
trying to query words such as 'doctorate' or 'programs' but the console is 
reporting '*:*' only the default wildcard.

Just curious if anyone has any helpful hints? The problem can be seen in detail 
on Stackoverflow,
http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-default-wildcard-and-not-what-i-query

Thank you,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: AJAX Solr returning the default wildcard *:* and not what I query

2013-10-30 Thread Reyes, Mark
solr.log file per Solr 4.5

http://pastebin.com/zSpERJZA


Thanks Shawn,
Mark



On 10/30/13, 12:44 PM, Shawn Heisey s...@elyograg.org wrote:

On 10/30/2013 1:26 PM, Reyes, Mark wrote:
 I am currently integrating JavaScript framework AJAX Solr to my domain.
I am trying to query words such as 'doctorate' or 'programs' but the
console is reporting '*:*' only the default wildcard.

 Just curious if anyone has any helpful hints? The problem can be seen
in detail on Stackoverflow,
 
http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-defau
lt-wildcard-and-not-what-i-query

We would have to know what Solr is actually receiving from your app. The
Solr log should have an entry for every query you do, and it includes
all of the parameters for that quey.  This is *not* the Logging tab in
the admin UI, but the actual logfile.  On Solr 4.3 and later with the
example logging setup, this is typically $CWD/logs/solr.log.

Thanks,
Shawn



IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.