RE: Highlighting externally stored text

2013-07-31 Thread JohnRodey
Just an update.  Change was pretty straight forward (at least for my simple
test case) just a few lines in the getBestFragments method seemed to do the
trick.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081748.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Highlighting externally stored text

2013-07-31 Thread JohnRodey
Hey Bryan, Thanks for the response!  To make use of the FastVectorHighlighter
you need to enable termVectors, termPositions, and termOffsets correct? 
Which takes a considerable amount of space, but is good to know and I may
possibly pursue this solution as well.  Just starting to look at the code
now, do you remember how substantial the change was?

Are there any other options?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387p4081719.html
Sent from the Solr - User mailing list archive at Nabble.com.


Luke's analysis of Trie Dates

2013-07-18 Thread JohnRodey
I have a TrieDateField dynamic field setup in my schema, pretty standard...

  

  

In my code I only set one field, "creation_tdt" and I round it to the
nearest second before storing it.  However when I analyze it with Luke I
get:



tdate
IT--OF--
*_tdt
(unstored field)
22404
-1

  22404
  22404
  22404
  22404
  22404
  22404
  22404
  16014
  6390
  1535
  1459
  1268
  1193
  1187
  1152
  1129
  1089
  ...


So my questions is, where are all these entries coming from?  They are not
the dates I specified because they have millis, and my field isn't
multivalued, so the term counts dont add up (how could I have more than
22404 terms if I only have 22404 documents).  Why multiple
"1970-01-01T00:00:00Z" entries?

Is this somehow related to Trie fields and how they are indexed?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Luke-s-analysis-of-Trie-Dates-tp4078885.html
Sent from the Solr - User mailing list archive at Nabble.com.


Highlighting externally stored text

2013-07-16 Thread JohnRodey
Does anyone know if Issue SOLR-1397 (It should be possible to highlight
external text )  is actively being worked by chance?  Looks like the last
update was May 2012.
https://issues.apache.org/jira/browse/SOLR-1397

I'm trying to find a way to best highlight search results even though those
results are not stored in my index.  Has anyone been successful in reusing
the SOLR highlighting logic on non-stored data?  Does anyone know if there
any other third party libraries that can do this for me until 1397 is
formally released?

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Highlighting-externally-stored-text-tp4078387.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Benefits of Solr over Lucene?

2013-02-12 Thread JohnRodey
So I have had a fair amount of experience using Solr.  However on a separate
project we are considering just using Lucene directly, which I have never
done.  I am trying to avoid finding out late that Lucene doesn't offer what
we need and being like "aw snap, it doesn't support geospatial"  (or
highlighting, or dynamic fields, or etc...).  I am more curious about core
index and search features, and not as much with sharding, cloud features,
different client languages and so on.

Any thoughts?

thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964p4040009.html
Sent from the Solr - User mailing list archive at Nabble.com.


Benefits of Solr over Lucene?

2013-02-12 Thread JohnRodey
I know that Solr web-enables a Lucene index, but I'm trying to figure out
what other things Solr offers over Lucene.  On the Solr features list it
says "Solr uses the Lucene search library and extends it!", but what exactly
are the extensions from the list and what did Lucene give you?  Also if I
have an index built through Solr is there a non-HTTP way to search that
index?  Because solr4j essentially just makes HTTP requests correct?

Some features Im particularly interested in are:
Geospatial Search
Highlighting
Dynamic Fields
Near Real-Time Indexing
Multiple Search Indices 

Thanks!



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Benefits-of-Solr-over-Lucene-tp4039964.html
Sent from the Solr - User mailing list archive at Nabble.com.


Propogating an accurate exceptions to the end user

2011-06-21 Thread JohnRodey
Solr3.1 using SolrJ

So I have a gui that allows folks to search my solr repository and I want to
show appropriate errors when something bad happens, but my problem is that
the Solr exception are not very pretty and sometimes are not very
descriptive.

For instance if I enter a bad query the message on the exception is "Error
executing query" and if I do getCause().getMessage() it gives "Bad Request 
Bad Request  request: http://1.2.3.4:1234/solr/";
This really doesn't help my user too much.

Another example is if a master search server serves out a request to a bunch
of shards I just get a Connection Refused error that doesn't specify which
connection was refused.

I can't image I am the first to run into this and was curious what others
do?  Do people just try to catch all common exceptions and print those
pretty?  What about exceptions that you don't test for?  How about
exceptions that don't really explain the real problem?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Propogating-an-accurate-exceptions-to-the-end-user-tp3091548p3091548.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hitting the URI limit, how to get around this?

2011-06-03 Thread JohnRodey
Yep that was my issue.

And like Ken said on Tomcat I set maxHttpHeaderSize="65536".



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020774.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hitting the URI limit, how to get around this?

2011-06-03 Thread JohnRodey
So here's what I'm seeing: I'm running Solr 3.1
I'm running a java client that executes a Httpget (I tried HttpPost) with a
large shard list.  If I remove a few shards from my current list it returns
fine, when I use my full shard list I get a "HTTP/1.1 400 Bad Request".  If
I execute it in firefox with a few shards removed it returns fine, with the
full shard list I get a blank screen returned immediately.

My URI works at around 7800 characters but adding one more shard to it blows
up.

Any ideas? 

I've tried using SolrJ rather than httpget before but ran into similar
issues but with even less shards.
See 
http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html
http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-td2748556.html
 

My shards are added dynamically, every few hours I am adding new shards or
cores into the cluster.  so I cannot have a shard list in the config files
unless I can somehow update them while the system is running.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3020185.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Better to have lots of smaller cores or one really big core?

2011-06-03 Thread JohnRodey
Thanks Erick for the response.

So my data structure is the same, i.e. they all use the same schema.  Though
I think it makes sense for us to somehow break apart the data, for example
by the date it was indexed.  I'm just trying to get a feel for how large we
should aim to keep those (by day, by week, by month, etc...).

So it sounds like we should aim to keep them at a size that one solr server
can host to avoid serving multiple cores.

One question, there is no real difference (other than configuration) from a
server hosting its own index vs. it hosting one core, is there?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3019686.html
Sent from the Solr - User mailing list archive at Nabble.com.


Better to have lots of smaller cores or one really big core?

2011-06-02 Thread JohnRodey
I am trying to decide what the right approach would be, to have one big core
and many smaller cores hosted by a solr instance.

I think there may be trade offs either way but wanted to see what others do. 
And by small I mean about 5-10 million documents, large may be 50 million.

It seems like small cores are better because
- If one server can host say 70 million documents (before memory issues) we
can get really close with a bunch of small indexes, vs only being able to
host one 50 million document index.  And when a software update comes out
that allows us to host 90 million then we could add a few more small
indexes. 
- It takes less time to build ten 5 million document indexes than one 50
million document index.

It seems like larger cores are better because
- Each core returns their result set, so if I want 1000 results and their
are 100 cores the network is transferring 10 documents for that search. 
Where if I had only 10 much larger cores only 1 documents would be sent
over the network.
- It would prolong my time until I hit uri length limits being that there
would be less cores in my system.

Any thoughts???  Other trade-offs???

How do you find what the right size for you is?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Better-to-have-lots-of-smaller-cores-or-one-really-big-core-tp3017973p3017973.html
Sent from the Solr - User mailing list archive at Nabble.com.


Hitting the URI limit, how to get around this?

2011-06-02 Thread JohnRodey
I have a master solr instance that I sent my request to, it hosts no
documents it just farms the request out to a large number of shards. All the
other solr instances that host the data contain multiple cores.

Therefore my search string looks like
"http://host:port/solr/select?...&shards=nodeA:1234/solr/core01,nodeA:1234/solr/core02,nodeA:1234/solr/core03,...";
This shard list is pretty long and has finally hit "the limit".

So my question is how to best avoid having to build such a long uri?

Is there a way to have mutiple tiers, where the master server has a list of
servers (nodeA:1234,nodeB:1234,...) and each of those nodes query the cores
that they host (nodeA hosts core01, core02, core03, ...)?

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Hitting-the-URI-limit-how-to-get-around-this-tp3017837p3017837.html
Sent from the Solr - User mailing list archive at Nabble.com.


Long list of shards breaks solrj query

2011-03-29 Thread JohnRodey
So I have a simple class that builds a SolrQuery and sets the "shards" param. 
I have a really long list of shards, over 250.  My search seems to work
until I get my shard list up to a certain length. As soon as I add one more
shard I get:

org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
INFO: I/O exception (java.net.SocketException) caught when processing
request: Connection reset by peer: socket write error
org.apache.commons.httpclient.HttpMethodDirector executeWithRetry
INFO: Retrying request

My class just looks like this:
public static void main(String[] args) {
try {
SolrServer s = new CommonsHttpSolrServer("http://mynode:8080/solr";);
SolrQuery q = new SolrQuery();
q.setQuery("test");
q.setHighlight(true);
q.setRows(50);
q.setStart(0);
q.setParam("shards", "node1:1010/solr/core01,node1:1010/solr/core02,...");
} catch (Exception e) {
e.printStackTrace();
}

If I execute the same request in a browser it returns fine.

One other question I had was even if I set the version to 2.2 the response
has version=1.  Is that normal?  In a browser it returns version=2.2 though.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Long-list-of-shards-breaks-solrj-query-tp2748556p2748556.html
Sent from the Solr - User mailing list archive at Nabble.com.


Architecture question about solr sharding

2011-03-22 Thread JohnRodey
I have an issue and I'm wondering if there is an easy way around it with just
SOLR.

I have multiple SOLR servers and a field in my schema is a relative path to
a binary file.  Each SOLR server is responsible for a different subset of
data that belongs to a different base path.

For Example...

My directory structure may look like this:
/someDir/Jan/binaryfiles/...
/someDir/Feb/binaryfiles/...
/someDir/Mar/binaryfiles/...
/someDir/Apr/binaryfiles/...

Server1 is responsible for Jan, Server2 for Feb, etc...

And a response document may have a field like this
my entry
binaryfiles/12345.bin

How can I tell from my main search server which server returned a result?
I cannot put the full path in the index because my path structure might
change in the future.  Using this example it may go to '/someDir/Jan2011/'.

I basically need to find a way to say 'Ah! server01 returned this result, so
it must be in /someDir/Jan'

Thanks!

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Architecture-question-about-solr-sharding-tp2716417p2716417.html
Sent from the Solr - User mailing list archive at Nabble.com.


General questions about distributed solr shards

2010-08-11 Thread JohnRodey

1) Is there any information on preferred maximum sizes for a single solr
index.  I've read some people say 10 million, some say 80 million, etc... 
Is there any official recommendation or has anyone experimented with large
datasets into the tens of billions?

2) Is there any down side to running multiple solr shard instances on a
single machine rather than one shard instance with a larger index per
machine?  I would think that having 5 instances with 1/5 the index would
return results approx 5 times faster.

3) Say you have a solr configuration with multiple shards.  If you attempt
to query while one of the shards is down you will receive a HTTP 500 on the
client due to a connection refused on the server.  Is there a way to tell
the server to ignore this and return as many results as possible?  In other
words if you have 100 shards, it is possible that occasionally a process may
die, but I would still like to return results from the active shards.

Thanks
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/General-questions-about-distributed-solr-shards-tp1095117p1095117.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Re: Disable Solr Response Formatting

2010-06-30 Thread JohnRodey

Thanks!  I was looking for things to change in the solrconfig.xml file.

indent=off
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933966.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Disable Solr Response Formatting

2010-06-30 Thread JohnRodey

Oops, let me try that again...

By default my SOLR response comes back formatted, like such 



  

  





Is there a way to tell it to return it unformatted? like: 

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933793.html
Sent from the Solr - User mailing list archive at Nabble.com.


Disable Solr Response Formatting

2010-06-30 Thread JohnRodey

By default my SOLR response comes back formatted, like such







Is there a way to tell it to return it unformatted? like:

 
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Disable-Solr-Response-Formatting-tp933785p933785.html
Sent from the Solr - User mailing list archive at Nabble.com.


Can solr return pretty text as the content?

2010-06-23 Thread JohnRodey

When I feed pretty text into solr for indexing from lucene and search for it,
the content is always returned as one long line of text.  Is there a way for
solr to return the pretty formatted text to me?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Can-solr-return-pretty-text-as-the-content-tp917912p917912.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does SOLR provide a java class to perform url-encoding

2010-05-25 Thread JohnRodey

I was assuming that I needed to leave the special characters in the http get,
but running the solr admin it looks like it converts them the same way that
URLEncoder.encode does.  What is the need to preserve special characters?

http://localhost:8983/solr/select?indent=on&version=2.2&q=%22mr.+bill%22+oh+n%3F&fq=&start=0&rows=50&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p843177.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Does SOLR provide a java class to perform url-encoding

2010-05-25 Thread JohnRodey

Thanks Sean, that was exactly what I need.  One question though...

How to correctly retain the Solr specific characters.
I tried adding escape chars but URLEncoder doesn't seem to care about that:
Example: 
String s1 = "\"mr. bill\" oh n?";
String s2 = "\\\"mr. bill\\\" oh n\\?";
String encoded1 = URLEncoder.encode(s1, "UTF-8");
String encoded2 = URLEncoder.encode(s2, "UTF-8");
System.out.println(encoded1);
System.out.println(encoded2);
Output:
%22mr.+bill%22+oh+n%3F
%5C%22mr.+bill%5C%22+oh+n%5C%3F

Should I allow the URLEncoder to translate s1, then replace %22 with ", %3F
with ?, and so on?
Or is there a better way?
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842744.html
Sent from the Solr - User mailing list archive at Nabble.com.


Does SOLR provide a java class to perform url-encoding

2010-05-25 Thread JohnRodey

I would like to leverage on whatever SOLR provides to properly url-encode a
search string.

For example a user enters:
"mr. bill" oh no

The URL submitted by the admin page is:
http://localhost:8983/solr/select?indent=on&version=2.2&q=%22mr.+bill%22+oh+no&fq=&start=0&rows=10&fl=*%2Cscore&qt=standard&wt=standard&explainOther=&hl.fl=

Since the admin page uses it I would image that this functionality is there,
but having some trouble finding it.
-- 
View this message in context: 
http://lucene.472066.n3.nabble.com/Does-SOLR-provide-a-java-class-to-perform-url-encoding-tp842660p842660.html
Sent from the Solr - User mailing list archive at Nabble.com.