Re: Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
Okay, let¹s try it this wayŠ

CURRENTLY:
Step 1: Type, your future into the search bar.
Step 2: 10 search results return.

I¹D LIKE TO SEE THIS:
Step 1: Type, ³your future² into the search bar.
Step 2: 1 search result returns.

Can this be accomplished through the Solr UI?

Thanks,

Mark

On 5/5/14, 3:17 PM, "Ahmet Arslan"  wrote:

>Hi Reyes,
>
>I think it is not clear your question.
>Please see : https://wiki.apache.org/solr/UsingMailingLists
>
>Ahmet
>
>On Tuesday, May 6, 2014 12:23 AM, "Reyes, Mark" 
>wrote:
>How could Solr accomplish an end-user behavior like a strict search?
>
>Let¹s say an end-user decides to use quotation marks in their keywords to
>provide specificity in their search results.
>
>Current:
>If you were to query: your future, then 10 results would return and print
>to the page.
>
>Expected:
>I¹d like to query: ³your future², then less than 10 results would return
>and print to the page.
>
>Regards,
>Mark
>
>IMPORTANT NOTICE: This e-mail message is intended to be received only by
>persons entitled to receive the confidential information it may contain.
>E-mail messages sent from Bridgepoint Education may contain information
>that is confidential and may be legally privileged. Please do not read,
>copy, forward or store this message unless you are an intended recipient
>of it. If you received this transmission in error, please notify the
>sender by reply e-mail and delete the message and any attachments. 


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Strict Search in Apache Solr

2014-05-05 Thread Reyes, Mark
How could Solr accomplish an end-user behavior like a strict search?

Let’s say an end-user decides to use quotation marks in their keywords to 
provide specificity in their search results.

Current:
If you were to query: your future, then 10 results would return and print to 
the page.

Expected:
I’d like to query: “your future”, then less than 10 results would return and 
print to the page.

Regards,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Indexing URLs for Binaries

2014-01-03 Thread Reyes, Mark
Check suffix-urlfilter.txt in your conf directory for Nutch. You might be
prohibiting those filetypes from the crawl.

- Mark






On 1/3/14, 10:29 AM, "Teague James"  wrote:

>I am using Nutch 1.7 with Solr 4.6.0 to index websites that have links to
>binary files, such as Word, PDF, etc. The crawler crawls the site but I am
>not getting the URLs of the links for the binary files no matter how deep
>I
>set the settings for the site. I see the labels for the links in the
>content, but not the URLs. Any ideas on how I could get those URLs back in
>my crawl?
>


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Proxy.php tutorials for AJAX Solr

2013-12-02 Thread Reyes, Mark
Are there any good tutorials that touch base on how to integrate the suggested 
PHP proxy for JavaScript framework AJAX Solr?

Here is the proxy, https://gist.github.com/evolvingweb/298580

Also on Stackoverflow, 
http://stackoverflow.com/questions/20338073/proxy-php-tutorials-for-ajax-solr

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Client-side proxy for Solr 4.5.0

2013-11-27 Thread Reyes, Mark
What about using some JSONP techniques since the results in the Solr
instance rest as key/value pairs?


On 11/26/13, 10:53 AM, "Markus Jelsma"  wrote:

>I don't think you mean client-side proxy. You need a server side layer
>such as a normal web application or good proxy. We use Nginx, it is very
>fast and very feature rich. Its config scripting is usually enough to
>restrict access and limit input parameters. We also use Nginx's embedded
>Perl and Lua scripting besides its config scripting to implement more
>difficult logic.
>
> 
> 
>-Original message-
>> From:Reyes, Mark 
>> Sent: Tuesday 26th November 2013 19:27
>> To: solr-user@lucene.apache.org
>> Subject: Client-side proxy for Solr 4.5.0
>> 
>> Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance
>>so that the end-user can see  their queries w/o being able to directly
>>access :8983?
>> 
>> Applications/frameworks used:
>> - Solr 4.5.0
>> - AJAX Solr (javascript library)
>> 
>> Thank you,
>> Mark
>> 
>> IMPORTANT NOTICE: This e-mail message is intended to be received only
>>by persons entitled to receive the confidential information it may
>>contain. E-mail messages sent from Bridgepoint Education may contain
>>information that is confidential and may be legally privileged. Please
>>do not read, copy, forward or store this message unless you are an
>>intended recipient of it. If you received this transmission in error,
>>please notify the sender by reply e-mail and delete the message and any
>>attachments.


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Client-side proxy for Solr 4.5.0

2013-11-26 Thread Reyes, Mark
Are there any GOOD client-side solutions to proxy a Solr 4.5.0 instance so that 
the end-user can see  their queries w/o being able to directly access :8983?

Applications/frameworks used:
- Solr 4.5.0
- AJAX Solr (javascript library)

Thank you,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread Reyes, Mark
Hi all:

I’m currently on a Solr 4.5.0 instance and running this tutorial, 
http://lucene.apache.org/solr/4_5_0/tutorial.html

My question is specific to indexing data as proposed from this tutorial,

$ java -jar post.jar solr.xml monitor.xml

The tutorial advises to validate from your localhost,
http://localhost:8983/solr/collection1/select?q=solr&wt=xml

However, what if my Solr core has both a collection1 and collection2, yet I 
desire the XML files to only be posted to collection2 only?

If possible, please advise.

Thanks,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Indexing data to a specific collection in Solr 4.5.0

2013-11-21 Thread Reyes, Mark
So then,
$ java -jar post.jar Durl=http://localhost:8983/solr/collection2/update
solr.xml monitor.xml





On 11/21/13, 8:14 AM, "xiezhide"  wrote:

>
>add Durl=http://localhost:8983/solr/collection2/update when run post.jar,
>此邮件发送自189邮箱
>
>"Reyes, Mark"  wrote:
>
>>Hi all:
>>
>>I’m currently on a Solr 4.5.0 instance and running this tutorial,
>>http://lucene.apache.org/solr/4_5_0/tutorial.html
>>
>>My question is specific to indexing data as proposed from this tutorial,
>>
>>$ java -jar post.jar solr.xml monitor.xml
>>
>>The tutorial advises to validate from your localhost,
>>http://localhost:8983/solr/collection1/select?q=solr&wt=xml
>>
>>However, what if my Solr core has both a collection1 and collection2,
>>yet I desire the XML files to only be posted to collection2 only?
>>
>>If possible, please advise.
>>
>>Thanks,
>>Mark
>>
>>IMPORTANT NOTICE: This e-mail message is intended to be received only by
>>persons entitled to receive the confidential information it may contain.
>>E-mail messages sent from Bridgepoint Education may contain information
>>that is confidential and may be legally privileged. Please do not read,
>>copy, forward or store this message unless you are an intended recipient
>>of it. If you received this transmission in error, please notify the
>>sender by reply e-mail and delete the message and any attachments.


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Nutch 1.7 solrdedup error

2013-11-17 Thread Reyes, Mark
When trying to delete duplicates after crawl I get the following,
http://pastebin.com/aQbqmPLm

When running this command on terminal:


$ bin/nutch solrdedup http://localhost:8983/solr/rockies

Here is my setup:
- Nutch 1.7
- Solr 4.5.0
- java version "1.6.0_51"

On Stackoverflow as well,
http://stackoverflow.com/questions/20013630/nutch-1-7-solrdedup-error

Thanks,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Nutch 1.7 + AJAX Solr returning ALL contents vs. SPECIFIC

2013-11-11 Thread Reyes, Mark
Hi:

I was encouraged to explore the Solr mail list, specifically regarding the 
fl–parameter.  What is that parameter for and can it accomplish my original 
task of crawling/indexing specific html components versus parsing the entire 
page?

My original question is listed below (previously on the Nutch mail list):

---
I’m using Nutch 1.7 to crawl/index the pages of my domain to Solr and 
JavaScript library AJAX Solr to capture that index as JSON, which would then 
print that to the front-end.
My question is, if it’s possible to have specific content return (i.e. An H2 
tag and a p tag) on the search results page versus all contents of that page?
---

Thanks again,
Mark





IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Setting up Multiple Cores on Solr 4.5.0

2013-11-10 Thread Reyes, Mark
Any good/recent documentation that I can reference on setting up multiple cores 
in Solr 4.5.0?

Thanks all,
Mark


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Reyes, Mark
Noted and will do (that goes twice for the suggestions and putting this on
the nutch list instead).

Thanks all,
Mark



On 11/1/13, 10:53 AM, "Furkan KAMACI"  wrote:

>As Markus pointed Nutch has a feature for such kind of situation. Here is
>Solr list but one more thing for you: www.mywebsite.com and
>mywebsite.commay point to "different" pages.
>
>
>2013/11/1 Markus Jelsma 
>
>> Hi - Use the domain-urlfilter for host, domain and TLD filtering.
>>
>> Also, please ask questions on the Nutch list, you're on Solr now :)
>>
>>
>> -Original message-
>> > From:Reyes, Mark 
>> > Sent: Friday 1st November 2013 17:24
>> > To: solr-user@lucene.apache.org
>> > Subject: Exclude urls without 'www' from Nutch 1.7 crawl
>> >
>> > I'm currently using Nutch 1.7 to crawl my domain. My issue is specific
>> to URLs being indexed as www vs. non-www.
>> >
>> > Specifically, after firing the crawl and index to Solr 4.5 then
>> validating the results on the front-end with AJAX Solr, the search
>>results
>> page lists results/pages that are both 'www' and '' urls such as:
>> >
>> > www.mywebsite.com
>> > mywebsite.com
>> > www.mywebsite.com/page1
>> > mywebsite.com/page1
>> >
>> > My understanding is that the url filtering (regex-urlfilter.txt) needs
>> modification. Are there any regex/nutch experts that could suggest a
>> solution?
>> >
>> > Here is the code on paste bin,
>> > http://pastebin.com/Cp6vUxPR
>> >
>> > Also on stack overflow,
>> >
>> 
>>http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from
>>-nutch-1-7-crawl
>> >
>> > Thank you,
>> > Mark
>> >
>> >
>> > IMPORTANT NOTICE: This e-mail message is intended to be received only
>>by
>> persons entitled to receive the confidential information it may contain.
>> E-mail messages sent from Bridgepoint Education may contain information
>> that is confidential and may be legally privileged. Please do not read,
>> copy, forward or store this message unless you are an intended
>>recipient of
>> it. If you received this transmission in error, please notify the
>>sender by
>> reply e-mail and delete the message and any attachments.
>>


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Exclude urls without 'www' from Nutch 1.7 crawl

2013-11-01 Thread Reyes, Mark
I'm currently using Nutch 1.7 to crawl my domain. My issue is specific to URLs 
being indexed as www vs. non-www.

Specifically, after firing the crawl and index to Solr 4.5 then validating the 
results on the front-end with AJAX Solr, the search results page lists 
results/pages that are both 'www' and '' urls such as:

www.mywebsite.com
mywebsite.com
www.mywebsite.com/page1
mywebsite.com/page1

My understanding is that the url filtering (regex-urlfilter.txt) needs 
modification. Are there any regex/nutch experts that could suggest a solution?

Here is the code on paste bin,
http://pastebin.com/Cp6vUxPR

Also on stack overflow,
http://stackoverflow.com/questions/19731904/exclude-urls-without-www-from-nutch-1-7-crawl

Thank you,
Mark


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

Re: AJAX Solr returning the default wildcard *:* and not what I query

2013-10-31 Thread Reyes, Mark
I just tweaked the reuters.js example to listen to the window.location
object and it resolved the wildcard returns.

I put it on pastebin,
http://pastebin.com/GyC4RMva


Thanks for the reply everyone,
Mark
---


P. 866.475.0317 x 3244
Bridgepoint Education
INNOVATIVE SOLUTIONS THAT ADVANCE LEARNING SM




On 10/31/13, 12:23 AM, "Raymond Wiker"  wrote:

>The parameters indicate a jQuery.ajax call with result type "jsonp" - a
>

Re: AJAX Solr returning the default wildcard *:* and not what I query

2013-10-30 Thread Reyes, Mark
solr.log file per Solr 4.5

http://pastebin.com/zSpERJZA


Thanks Shawn,
Mark



On 10/30/13, 12:44 PM, "Shawn Heisey"  wrote:

>On 10/30/2013 1:26 PM, Reyes, Mark wrote:
>> I am currently integrating JavaScript framework AJAX Solr to my domain.
>>I am trying to query words such as 'doctorate' or 'programs' but the
>>console is reporting '*:*' only the default wildcard.
>>
>> Just curious if anyone has any helpful hints? The problem can be seen
>>in detail on Stackoverflow,
>> 
>>http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-defau
>>lt-wildcard-and-not-what-i-query
>
>We would have to know what Solr is actually receiving from your app. The
>Solr log should have an entry for every query you do, and it includes
>all of the parameters for that quey.  This is *not* the Logging tab in
>the admin UI, but the actual logfile.  On Solr 4.3 and later with the
>example logging setup, this is typically $CWD/logs/solr.log.
>
>Thanks,
>Shawn
>


IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.

AJAX Solr returning the default wildcard *:* and not what I query

2013-10-30 Thread Reyes, Mark
I am currently integrating JavaScript framework AJAX Solr to my domain. I am 
trying to query words such as 'doctorate' or 'programs' but the console is 
reporting '*:*' only the default wildcard.

Just curious if anyone has any helpful hints? The problem can be seen in detail 
on Stackoverflow,
http://stackoverflow.com/questions/19691535/ajax-solr-returning-the-default-wildcard-and-not-what-i-query

Thank you,
Mark

IMPORTANT NOTICE: This e-mail message is intended to be received only by 
persons entitled to receive the confidential information it may contain. E-mail 
messages sent from Bridgepoint Education may contain information that is 
confidential and may be legally privileged. Please do not read, copy, forward 
or store this message unless you are an intended recipient of it. If you 
received this transmission in error, please notify the sender by reply e-mail 
and delete the message and any attachments.