Re: Solr hanging

2012-04-26 Thread Trym R. Møller

Hi Chris Hostetter

Does that mean, that the last two questions I have posted hasn't reached 
the mailing list?


Best regards Trym

Den 25-04-2012 19:58, Chris Hostetter skrev:

: Subject: Solr hanging
: References:31fdac6b-c4d9-4383-865d-2faca0f09...@geekychris.com
:can4yxvff-mqoawbyow2rsf_v4tc8vpgb+z8auv-z3zp94vv...@mail.gmail.com
: In-Reply-To:
:can4yxvff-mqoawbyow2rsf_v4tc8vpgb+z8auv-z3zp94vv...@mail.gmail.com

https://people.apache.org/~hossman/#threadhijack
Thread Hijacking on Mailing Lists

When starting a new discussion on a mailing list, please do not reply to
an existing message, instead start a fresh email.  Even if you change the
subject line of your email, other mail headers still track which thread
you replied to and your question is hidden in that thread and gets less
attention.   It makes following discussions in the mailing list archives
particularly difficult.



-Hoss


Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-26 Thread pcrao
Hi,

Any more thoughts??

Thanks,
PC Rao.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3940383.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-26 Thread Ryan McKinley
In general -- i would not suggest mixing EmbeddedSolrServer with a
different style (unless the other instances are read only).  If you
have multiple instances writing to the same files on disk you are
asking for problems.

Have you tried just using StreamingUpdateSolrServer for daily update?
I would suspect that it would be faster then EmbeddedSolrServer
anyway.

ryan



On Wed, Apr 25, 2012 at 11:32 PM, pcrao purn...@gmail.com wrote:
 Hi,

 Any more thoughts??

 Thanks,
 PC Rao.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3940383.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting fields in SOLR using Solrj

2012-04-26 Thread Ryan McKinley
I would suggest debugging with browser requests -- then switching to
Solrj after you are at 1st base.

In particular, try adding the debugQuery=true parameter to the
request and see what solr thinks is happening.

The value that will work for the 'qt' parameter depends on what is
configured in solrconfig.xml -- I suspect you want to point to a
requestHandler that is configured to use edismax query parser.  This
can be configured by default with:

lst name=defaults
str name=defTypeedismax/str
/lst

ryan


On Wed, Apr 25, 2012 at 3:57 PM, Joe joe.pol...@gmail.com wrote:
 Hi,

 I'm using the solrj API to query my SOLR 3.6 index. I have multiple text
 fields, which I would like to weight differently. From what I've read, I
 should be able to do this using the dismax or edismax query types. I've
 tried the following:

 SolrQuery query = new SolrQuery();
 query.setQuery( title:apples oranges content:apples oranges);
 query.setQueryType(edismax);
 query.set(qf, title^10.0 content^1.0);
 QueryResponse rsp = m_Server.query( query );

 But this doesn't work. I've tried the following variations to set the query
 type, but it doesn't seem to make a difference.

 query.setQueryType(dismax);
 query.set(qt,dismax);
 query.set(type,edismax);
 query.set(qt,edismax);
 query.set(type,dismax);

 I'd like to retain the full Lucene query syntax, so I prefer ExtendedDisMax
 to DisMax. Boosting individual terms in the query (as shown below) does
 work, but is not a valid solution, since the queries are automatically
 generated and can get arbitrarily complex is syntax.

 query.setQuery( title:apples^10.0 oranges^10.0 content:apples oranges);

 Any help would be much appreciated.

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Boosting-fields-in-SOLR-using-Solrj-tp3939789p3939789.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Boosting fields in SOLR using Solrj

2012-04-26 Thread Michael Kuhlmann

Am 26.04.2012 00:57, schrieb Joe:

Hi,

I'm using the solrj API to query my SOLR 3.6 index. I have multiple text
fields, which I would like to weight differently. From what I've read, I
should be able to do this using the dismax or edismax query types. I've
tried the following:

SolrQuery query = new SolrQuery();
query.setQuery( title:apples oranges content:apples oranges);
query.setQueryType(edismax);
query.set(qf, title^10.0 content^1.0);
QueryResponse rsp = m_Server.query( query );


Why do you try to construct your own query, when you're using an edismax 
query with a defined qf parameter?


What you're searching is the text title:apples oranges content:apples 
oranges. Depending on your analyzer chain, it might be that title:appes 
and content:apples are kept as one token, so nothing is found because 
there's no such token in the index.


Why don't you simply query for apples oranges? That's how (e)dismax is 
made for. Have a deeper look at http://wiki.apache.org/solr/DisMax.


BTW, if you used the above query in a Lucene parser, it would look for 
apples in title and content field, but look for oranges in your 
default search field. This is because you didn't quote apples oranges. 
Since you want to use Edismax, you can ignore this, it's just that you 
current query won't work as expected in both cases.


-Kuli


Re: EmbeddedSolrServer and StreamingUpdateSolrServer

2012-04-26 Thread pcrao
Hi Ryan,

I see.

Yes, for incremental indexing(Hourly) we use StreamingUpdateSolrServer
and it is faster than EmbeddedSolrServer.

We are also using, Embedded server for full indexing on a daily basis and
it is efficient for full indexing as it can handle large number of documents
in a better way.

THanks,
PC Rao.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/EmbeddedSolrServer-and-StreamingUpdateSolrServer-tp3889073p3940818.html
Sent from the Solr - User mailing list archive at Nabble.com.


solr error after relacing schema.xml

2012-04-26 Thread BillB1951
Trouble getting solr and Haystack working together.  I have solar
working (can get admin screen and query test data).  I then create my
search_indexes.py (per getting started example, I also have run syndb
and added data to the Notes table).  I run manage.py
build_solar_schema, it generates XML for the schema.xml file. I
replace the contents of the schema.xml with the new content.  I
restart the solr server, then try to start admin  but get following
error

HTTP ERROR 500

Problem accessing /solr/admin/. Reason:

Severe errors in solr configuration.

Check your log files for more detailed information on what may be
wrong.

If you want solr to continue after configuration errors, change:

 abortOnConfigurationErrorfalse/abortOnConfigurationError

in solr.xml

-
org.apache.solr.common.SolrException: No cores were created, please
check the logs for errors

what am I doing wrong. 

-
BillB1951
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3940133.html
Sent from the Solr - User mailing list archive at Nabble.com.


Using Customized sorting in Solr

2012-04-26 Thread solr user
Hi,

We are planning to move the search of one of our listing based portal to
solr/lucene search server from sphinx search server. But we are facing a
challenge is porting customized sorting being used in our portal. We only
have last 60 days of data live.The algorithm is as follows:-

   1.  Put all listings into 54 buckets – (Date bucket for 60 days)  i.e.
   buckets of 7day, 1 day, 1 day……
   2.  For each date bucket we make 2 buckets –(Paid / free bucket)
   3.  For each paid / free bucket cycle the advertisers on uniqueness basis

  i.e. inside a bucket the ordering should be 1st listing
of each advertiser, 2nd listing of each advertiser and so on
  in other words within a *sub-bucket* second listing of an
advertiser will be displayed only after first listing of all advertiser has
been displayed.

For taking care of point 1 and 2 we have created a field named bucket_index
at the time of indexing the data and get the results sorted by this index,
but we are not able to find a way to create a sort field at index time or
think of a sort function for the point no 3.  Please suggest if there is a
way to do so in solr.

Tia,

BC Rathore


Re: Dynamic creation of cores for this use case.

2012-04-26 Thread Erick Erickson
Take a look here:
http://wiki.apache.org/solr/CoreAdmin?highlight=%28create%29%7C%28core%29#CREATE

I'm not sure about your partners view. As an alternative to
creating individual cores, you could simply use a single
index and use filter queries (fq) to restrict the
selection to the relevant customers. If you also included
a field with a value for which partner group the customer
belonged to, you could use a similar technique there.

Best
Erick

On Wed, Apr 25, 2012 at 5:13 AM, pprabhcisco123 ppr...@gmail.com wrote:
 Hi everyone ,

  I am new to  solr. I have a use case which describes as below,

  I have a use case to create cores based on customers and partners. There
 are about 2500 customers each customers having on an average of 1
 devices.
   The partner is a group of customers say 30 . So , we have customer column
 and partner column.
   So, the use case is to
   1.   Create a VM running solr, with one core per customer.
   2.   Index all of each customer's data (device info , config text,
 metadata, etc) into a single core.
   3.   Index all 30 customer's data into the partners view after creating
 parnter per 30 customer.


  Can any one please help me on this case ?

 Thanks
 Prabhakaran. P

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/Dynamic-creation-of-cores-for-this-use-case-tp3937696p3937696.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Stats.facet on date returns error

2012-04-26 Thread Erick Erickson
Works on my macine (tm). I tried both trunk and 3.6, so I guess that
means we need more details.

What version are you running on? What is your exact URL? Did you
do anything like change the definition without blowing away your
index and re-indexing? Have you tried using Luke or the schema
browser (or terms component) to examine your data and see if it's
odd?

Best
Erick

On Wed, Apr 25, 2012 at 5:53 PM, Peter Markey sudoma...@gmail.com wrote:
 Hello,

 I have been trying stats.facet option on a tdate field and I end up getting
 an error even though solr has only proper values for all the docs for the
 date field. It happens for any type of trie-field. Any help would be
 appreciated.

 My date field is defined as:
 field name=doc_time type=tdate indexed=true stored=false/


 error:

 SEVERE: org.apache.solr.common.SolrException: Invalid Date String:'M\-r/'
 at org.apache.solr.schema.DateField.parseMath(DateField.java:168)
 at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:321)
 at org.apache.solr.schema.TrieField.readableToIndexed(TrieField.java:300)
 at org.apache.solr.schema.TrieField.toInternal(TrieField.java:330)
 at org.apache.solr.schema.TrieDateField.toInternal(TrieDateField.java:102)
 at
 org.apache.solr.request.UnInvertedField.getStats(UnInvertedField.java:609)
 at
 org.apache.solr.handler.component.SimpleStats.getStatsFields(StatsComponent.java:235)
 at
 org.apache.solr.handler.component.SimpleStats.getStatsCounts(StatsComponent.java:211)
 at
 org.apache.solr.handler.component.StatsComponent.process(StatsComponent.java:70)
 at
 org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:204)
 at
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
 at org.apache.solr.core.SolrCore.execute(SolrCore.java:1540)
 at
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:435)
 at
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:256)
 at
 org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
 at
 org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
 at
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
 at
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:169)
 at
 org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
 at
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
 at
 org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
 at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
 at
 org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
 at
 org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
 at
 org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:987)
 at
 org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:579)
 at
 org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:309)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
 at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
 at java.lang.Thread.run(Thread.java:662)


auto warm up cache and new data

2012-04-26 Thread mizayah
Please help me understand that.
What wil happen if if have cached data and thay change after comit and i
have autowarm set up.
Old cached data will be still accesible in cache so i will get old data?

That means if autowarm copy all needed data to new cache probably i will
never see new data?
Cache in solr expire only with searcher going down right?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/auto-warm-up-cache-and-new-data-tp3940963p3940963.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: auto warm up cache and new data

2012-04-26 Thread Tomás Fernández Löbbe
The warmup process reloads the data from the new index.

Cache in Solr expires with a new searcher, correct. You could have
evictions too if it gets filled.

On Thu, Apr 26, 2012 at 8:33 AM, mizayah miza...@gmail.com wrote:

 Please help me understand that.
 What wil happen if if have cached data and thay change after comit and i
 have autowarm set up.
 Old cached data will be still accesible in cache so i will get old data?

 That means if autowarm copy all needed data to new cache probably i will
 never see new data?
 Cache in solr expire only with searcher going down right?

 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/auto-warm-up-cache-and-new-data-tp3940963p3940963.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr for routing a webapp

2012-04-26 Thread Björn Zapadlo
Hello,

I'm thinking about using a Solr index for routing a webapp.

I have pregenerated base urls in my index. E.g.
/foo/bar1
/foo/bar2
/foo/bar3
/foo/bar4
/bar/foo1
/bar/foo2
/bar/foo3

I try to find a way to match /foo/bar3/parameter1/value1/parameter2/value2 
without knowing that parameter and value are not part of the base url. In fact 
I need the best hit from the beginng.
Is that possible and are there any performance issues?

I hope my problem is understandable!

Thanks in advance and best regards,
Bjoern


Format content field

2012-04-26 Thread webdev1977
Greetings all!

I have created a enterprise search architecture that includes both nutch for
crawling as well as solr for indexing.  I was so focused on the nutch part
that I didn't realized that my user interface (Jquery based) was lacking in
appeal.

One of my issues is the format of the text in the content field.  Is there
any way to force it to include spaces, etc for the text.  

for instance, this is an example of a value:

thereisno way to know.Next sentence goes here.BUT I am all squished  

This is sample content from a html page.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Format-content-field-tp3941336p3941336.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic creation of cores for this use case.

2012-04-26 Thread pprabhcisco123
Hi,

Thanks Erick for your response .  Actually , the total no of customers is
4500 and every group of customers say 30 is a considered to be a partner or
agent.  

 The use case is to create a core for each customer as well as partner .
Since its very difficult to create cores statically in solr.xml file for all
4500 customers , is there any way to create the cores dynamically or on the
fly.

 Actually we are implementing solr in our application that is live now , in
the application customers  or partners around the world  will login and
search there devices. So , our manager wants us to do a small poc on how
solr will fit into the application.

  I am very much depressed on this use case as it took more than the
deadline . So, please help me on this regards.

Thanks


--
View this message in context: 
http://lucene.472066.n3.nabble.com/Dynamic-creation-of-cores-for-this-use-case-tp3937696p3941433.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-26 Thread geeky2
hello,

sorry - i overlooked this message - thanks for checking back and thanks for
the info.

yes - replication seems to be working now:

tailed from logs just now:

2012-04-26 09:21:33,284 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-26 09:21:53,279 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-26 09:22:13,279 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.
2012-04-26 09:22:33,279 INFO  [org.apache.solr.handler.SnapPuller]
(pool-12-thread-1) Slave in sync with master.



 

--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-replication-failing-with-error-Master-at-is-not-available-Index-fetch-failed-tp3932921p3941447.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Dynamic creation of cores for this use case.

2012-04-26 Thread Michael Kuhlmann

Am 26.04.2012 16:17, schrieb pprabhcisco123:

  The use case is to create a core for each customer as well as partner .
Since its very difficult to create cores statically in solr.xml file for all
4500 customers , is there any way to create the cores dynamically or on the
fly.


Yes there is. Have a look at: http://wiki.apache.org/solr/CoreAdmin#CREATE

I suggest to set the persistent flag in solr.xml to true.

I think all your cores will share the same configuration, so you can 
point all configuration directories to the same one, and install unique 
data dirs.


This should be relative simple in theory. In practise, you might detect 
performance issues with such a configuration. It should be no big 
problem if at most few hundred users work in parallel, but as soon as 
most cores are used all together, I predict you'll have bad performance.


Solr has no hard-coded limitation in the number of cores, but each core 
has its own caches and readers. Depending on your machine configuration, 
this may be too much.


My suggestion is to try it out. It should work first, and if you're 
hitting performance limits, then you can modify yourn configuration.


-Kuli


Re: Solr for routing a webapp

2012-04-26 Thread Paul Libbrecht
Have you tried using mod_rewrite for this?

paul


Le 26 avr. 2012 à 15:16, Björn Zapadlo a écrit :

 Hello,
 
 I'm thinking about using a Solr index for routing a webapp.
 
 I have pregenerated base urls in my index. E.g.
 /foo/bar1
 /foo/bar2
 /foo/bar3
 /foo/bar4
 /bar/foo1
 /bar/foo2
 /bar/foo3
 
 I try to find a way to match /foo/bar3/parameter1/value1/parameter2/value2 
 without knowing that parameter and value are not part of the base url. In 
 fact I need the best hit from the beginng.
 Is that possible and are there any performance issues?
 
 I hope my problem is understandable!
 
 Thanks in advance and best regards,
 Bjoern



impact of EdgeNGramFilterFactory on indexing process?

2012-04-26 Thread geeky2

Hello all,

i am experimenting with EdgeNGramFilterFactory - on two of the fieldTypes in
my schema.

   filter class=solr.EdgeNGramFilterFactory minGramSize=3
maxGramSize=15 side=front/

i believe i understand this - but want to verify:

1) will this increase my index time?
2) will increase the number of documents in my index?

thank you

--
View this message in context: 
http://lucene.472066.n3.nabble.com/impact-of-EdgeNGramFilterFactory-on-indexing-process-tp3941743p3941743.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr error after relacing schema.xml

2012-04-26 Thread Mark Miller
Try looking at the logs.

On Apr 25, 2012, at 10:53 PM, BillB1951 wrote:

 Trouble getting solr and Haystack working together.  I have solar
 working (can get admin screen and query test data).  I then create my
 search_indexes.py (per getting started example, I also have run syndb
 and added data to the Notes table).  I run manage.py
 build_solar_schema, it generates XML for the schema.xml file. I
 replace the contents of the schema.xml with the new content.  I
 restart the solr server, then try to start admin  but get following
 error
 
 HTTP ERROR 500
 
 Problem accessing /solr/admin/. Reason:
 
Severe errors in solr configuration.
 
 Check your log files for more detailed information on what may be
 wrong.
 
 If you want solr to continue after configuration errors, change:
 
 abortOnConfigurationErrorfalse/abortOnConfigurationError
 
 in solr.xml
 
 -
 org.apache.solr.common.SolrException: No cores were created, please
 check the logs for errors
 
 what am I doing wrong. 
 
 -
 BillB1951
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3940133.html
 Sent from the Solr - User mailing list archive at Nabble.com.

- Mark Miller
lucidimagination.com













Re: Recovery - too many updates received since start

2012-04-26 Thread Mark Miller

On Apr 24, 2012, at 9:31 AM, Trym R. Møller wrote:

 Hi
 
 I experience that a Solr looses its connection with Zookeeper and 
 re-establish it. After Solr is reconnection to Zookeeper it begins to recover.
 It has been missing the connection approximately 10 seconds and meanwhile the 
 leader slice has received some documents (maybe about 1000 documents). Solr 
 fails to update peer sync with the log message:
 Apr 21, 2012 10:13:40 AM org.apache.solr.update.PeerSync sync
 WARNING: PeerSync: core=mycollection_slice21_shard1 
 url=zk-1:2181,zk-2:2181,zk-3:2181 too many updates received since start - 
 startingUpdates no longer overlaps with our currentUpdates

You can configure the timeout here - I may have chosen a default that is too 
low based on some reports.

 
 Looking into PeerSync and UpdateLog I can see that 100 updates is the maximum 
 allowed updates that a shard can be behind.
 Is it correct that this is not configurable and what is the reasons for 
 choosing 100?

Yonik chose this - I'll let him expand on it if he see this. I think it's not 
configurable currently, but perhaps with the right caveats as doc, it should be.


 
 I suspect that one must compare the work needed to replicate the full index 
 with the performance loss/resource usage when enhancing the size of the 
 UpdateLog?

Yeah, I think that is the gist of it. There may be another gotchya or two, I 
just don't remember at the moment. Yonik?

 
 Any comments regarding this is greatly appreciated.
 
 Best regards Trym

- Mark Miller
lucidimagination.com













Re: Question on Facet counts by grouped results

2012-04-26 Thread Sohail Aboobaker
Never mind, I did not notice that this is coming in Solr 4.0. Any
ideas on when Solr 4.0 will be out?

Sohail


Re: solr error after relacing schema.xml

2012-04-26 Thread BillB1951
It does not appear that any logfiles were created.


-
BillB1951
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3941997.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: solr error after relacing schema.xml

2012-04-26 Thread Mark Miller
By default logging goes to std out. You probably want to configure real logging 
though: http://wiki.apache.org/solr/SolrJetty#Logging

On Apr 26, 2012, at 1:33 PM, BillB1951 wrote:

 It does not appear that any logfiles were created.
 
 
 -
 BillB1951
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3941997.html
 Sent from the Solr - User mailing list archive at Nabble.com.

- Mark Miller
lucidimagination.com













Re: solr replication failing with error: Master at: is not available. Index fetch failed

2012-04-26 Thread Mark Miller

On Apr 23, 2012, at 12:10 PM, geeky2 wrote:

 http://someip:someport/somepath/somecore/admin/replication/ is not
 available. Index fetch failed. Exception: Invalid version (expected 2, but
 10) or the data in not in 'javabin' format

This is kind of a bug. When Solr tries to talk in javabin and gets an http 
response instead (like a 404 response - what this likely is) it does this. 
Really it should detect this case and give you the proper error. I almost think 
someone made this change already in trunk based on what I was seeing yesterday, 
but I'm not sure.

- Mark Miller
lucidimagination.com













Re: Question on Facet counts by grouped results

2012-04-26 Thread Mark Miller

On Apr 26, 2012, at 1:24 PM, Sohail Aboobaker wrote:

  Any
 ideas on when Solr 4.0 will be out?

We are hoping this year. There will be a series of alphas and betas that should 
start within a month or few.

- Mark Miller
lucidimagination.com













Setting FuzzyConfig's prefixLength ?

2012-04-26 Thread Phill Tornroth
I'd like to change Lucene's FuzzyConfig prefixLength from it's default
value of 0. Is there a way to configure that via Solr somehow? I've noticed
references on the list to people recompiling lucene from source in order to
change this value, and I'm hoping not to need to resort to the same.

Thanks in advance,
Phill


searchable solr user mail archive

2012-04-26 Thread Sohail Aboobaker
Hi,

Is there a searchable archive for solr user emails available somewhere
to avoid questions already asked on list?

Sohail


Re: solr error after relacing schema.xml

2012-04-26 Thread Lance Norskog
Which version of Solr does Haystack expect? The schema builder might
be targeting an older version of Solr.

On Thu, Apr 26, 2012 at 10:47 AM, Mark Miller markrmil...@gmail.com wrote:
 By default logging goes to std out. You probably want to configure real 
 logging though: http://wiki.apache.org/solr/SolrJetty#Logging

 On Apr 26, 2012, at 1:33 PM, BillB1951 wrote:

 It does not appear that any logfiles were created.


 -
 BillB1951
 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3941997.html
 Sent from the Solr - User mailing list archive at Nabble.com.

 - Mark Miller
 lucidimagination.com














-- 
Lance Norskog
goks...@gmail.com


Re: searchable solr user mail archive

2012-04-26 Thread Chris Hostetter

: Is there a searchable archive for solr user emails available somewhere
: to avoid questions already asked on list?

https://wiki.apache.org/solr/SolrResources#Mailing_List_Archives

Or just use the search box in the top right corner of the main solr 
website...

http://lucene.apache.org/solr/


-Hoss


Re: QueryElevationComponent and distributed search

2012-04-26 Thread srinir
Can anyone help me out in understand the fix to QueryElevationComponent (in
Solr 4.0) to make it work for distributed search.

--
View this message in context: 
http://lucene.472066.n3.nabble.com/QueryElevationComponent-and-distributed-search-tp3936998p3942221.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr for routing a webapp

2012-04-26 Thread Paul Libbrecht
Or write your own query component mapping /solr/* in the web.xml, exposing the 
request by a thread-local through a filter, and reading this setting the 
appropriate query parameters...

Performance-wise, this seems quite reasonable I think.

paul


Le 26 avr. 2012 à 16:58, Paul Libbrecht a écrit :

 Have you tried using mod_rewrite for this?
 
 paul
 
 
 Le 26 avr. 2012 à 15:16, Björn Zapadlo a écrit :
 
 Hello,
 
 I'm thinking about using a Solr index for routing a webapp.
 
 I have pregenerated base urls in my index. E.g.
 /foo/bar1
 /foo/bar2
 /foo/bar3
 /foo/bar4
 /bar/foo1
 /bar/foo2
 /bar/foo3
 
 I try to find a way to match /foo/bar3/parameter1/value1/parameter2/value2 
 without knowing that parameter and value are not part of the base url. In 
 fact I need the best hit from the beginng.
 Is that possible and are there any performance issues?
 
 I hope my problem is understandable!
 
 Thanks in advance and best regards,
 Bjoern
 



Re: solr error after relacing schema.xml

2012-04-26 Thread BillB1951
I'm using haystack 2.0.0Beta, and Apache-Solr-3.6.0.  

I'm not sure how to determine the schema.xml version, but I do notice that
the solr example's  schema.xml is 

schema name=example version=1.5-  and the schema.xml generated by
haystack is

schema name=default version=1.4.  Can I specify another schema.xml
generator for haystack?  If so , where?




-
BillB1951
--
View this message in context: 
http://lucene.472066.n3.nabble.com/solr-error-after-relacing-schema-xml-tp3940133p3942303.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question on Facet counts by grouped results

2012-04-26 Thread Sohail Aboobaker
Hi,

I am trying nightly build for solr 4.0. I downloaded the build and am
able to start it. In 3.x, I copied the example directory and updated
the schema.xml. It worked fine but in 4.0, I did the same thing (make
a copy of example) but when I change the schema, I get following:

Apr 26, 2012 5:04:12 PM org.apache.solr.common.SolrException log
SEVERE: null:java.lang.RuntimeException: Can't find resource
'stopwords_en.txt' in classpath or 'solr/./conf/',
cwd=/apps/servers/apache-solr-4.0-2012-04-26_08-10-58/apache-solr-4.0-2012-04-26_08-10-58/trusted

Do i need to copy some other files in my copied directory as well?

Sohail


Re: Solr4 CoreContainer failed to load with older version of Slf4j 1.5.2

2012-04-26 Thread Mark Miller
I also ran into a problem using 1.6.1 - thats the breaks of progress I
guess ;)

On Thu, Apr 26, 2012 at 4:07 PM, Gopal Patwa gopalpa...@gmail.com wrote:

 I am using Solr4 nightly build apache-solr-4.0-2012-04-26_08-10-58 and I
 saw Slf4j version was upgraded to 1.6.4 and it is failing now to start
 Solr, if I want to use previous version of Slf4j version like 1.5.2

 12:43:48,913 ERROR [SolrDispatchFilter] Could not start Solr. Check
 solr/home property and the logs
 12:43:48,944 ERROR [SolrCore] null:java.lang.NoSuchMethodError:

 org.slf4j.impl.StaticLoggerBinder.getSingleton()Lorg/slf4j/impl/StaticLoggerBinder;
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:395)
 at org.apache.solr.core.CoreContainer.load(CoreContainer.java:355)
 at

 org.apache.solr.core.CoreContainer$Initializer.initialize(CoreContainer.java:304)


 When I investigated the code, I found it is using new method
 org.slf4j.impl.StaticLoggerBinder.getSingleton() , which was added in
 CoreContainer class during initialize logging but this method is not
 available in Slf4j 1.5.2 version


 Thanks
 Gopal




-- 
- Mark

http://www.lucidimagination.com


HTTP Auth and Distributed Search?

2012-04-26 Thread Michael Della Bitta
Hi,

I'm wondering if there's any way to use container-based HTTP auth and
Distributed Search configured in the SearchHandler that I haven't
discovered aside from writing my own shard handler implementation.

Thanks,

Michael



Re: HTTP Auth and Distributed Search?

2012-04-26 Thread Mark Miller

On Apr 26, 2012, at 5:25 PM, Michael Della Bitta wrote:

 Hi,
 
 I'm wondering if there's any way to use container-based HTTP auth and
 Distributed Search configured in the SearchHandler that I haven't
 discovered aside from writing my own shard handler implementation.
 
 Thanks,
 
 Michael
 


I think there is an ugly global way to support this by setting some global 
properties for HttpClient. I can't remember clearly offhand though.

We should add explicit support for this I think - just like we have for 
replication.

- Mark Miller
lucidimagination.com













Re: HTTP Auth and Distributed Search?

2012-04-26 Thread Michael Della Bitta
Really? Is that in a .properties file somewhere, or would I have to do
it in code?

I was sort of hoping I'd be able to add the credentials to the URL in
the shards field, but looking at the source, that won't fly. While we're
on the topic, it might be nice to be able to specify the connection
scheme, too (e.g. for HTTPS).

I'd be willing to make a patch if there's a decision on the way this
should work.

Thanks,

Michael

On Thu, 2012-04-26 at 17:55 -0400, Mark Miller wrote:
 On Apr 26, 2012, at 5:25 PM, Michael Della Bitta wrote:
 
  Hi,
  
  I'm wondering if there's any way to use container-based HTTP auth and
  Distributed Search configured in the SearchHandler that I haven't
  discovered aside from writing my own shard handler implementation.
  
  Thanks,
  
  Michael
  
 
 
 I think there is an ugly global way to support this by setting some global 
 properties for HttpClient. I can't remember clearly offhand though.
 
 We should add explicit support for this I think - just like we have for 
 replication.
 
 - Mark Miller
 lucidimagination.com
 
 
 
 
 
 
 
 
 
 
 




Re: Using Customized sorting in Solr

2012-04-26 Thread Jan Høydahl
Hi,

How about trying grouping with paging?
First you do 
group=truegroup.field=advertiserIdgroup.limit=1group.offset=0group.main=truesort=somethinggroup.sort=how-much-paid
 desc

That gives you one listing per advertiser, sorted the way you like.
Then to grab the next batch of ads, you go group.offset=1 etc etc.

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 26. apr. 2012, at 08:10, solr user wrote:

 Hi,
 
 We are planning to move the search of one of our listing based portal to
 solr/lucene search server from sphinx search server. But we are facing a
 challenge is porting customized sorting being used in our portal. We only
 have last 60 days of data live.The algorithm is as follows:-
 
   1.  Put all listings into 54 buckets – (Date bucket for 60 days)  i.e.
   buckets of 7day, 1 day, 1 day……
   2.  For each date bucket we make 2 buckets –(Paid / free bucket)
   3.  For each paid / free bucket cycle the advertisers on uniqueness basis
 
  i.e. inside a bucket the ordering should be 1st listing
 of each advertiser, 2nd listing of each advertiser and so on
  in other words within a *sub-bucket* second listing of an
 advertiser will be displayed only after first listing of all advertiser has
 been displayed.
 
 For taking care of point 1 and 2 we have created a field named bucket_index
 at the time of indexing the data and get the results sorted by this index,
 but we are not able to find a way to create a sort field at index time or
 think of a sort function for the point no 3.  Please suggest if there is a
 way to do so in solr.
 
 Tia,
 
 BC Rathore



Per-User Sorting on an ExternalFileField

2012-04-26 Thread Phill Tornroth
I'm trying pretty hard to come up with a solution that lets me sort by
per-user scores that I calculate based on my data. Today, I'm trying to use
a combination of ExternalFileField and dynamic fields, where the
presumption is that each user might have their own file full of scores. I
think the fields are hooked up okay, but I can't sort on them because it
appears ExternalFileField explicitly doesn't support this operation.

SEVERE: java.lang.UnsupportedOperationException
at
org.apache.solr.schema.ExternalFileField.getSortField(ExternalFileField.java:91)


I'm using Solr 3.5. Does anyone have a suggestion as to how to end up
adding this extra dimension so that I can do per-user relevance? It seems
like an oft-asked, rarely-answered question.

Thanks in advance,

Phill


Re: Per-User Sorting on an ExternalFileField

2012-04-26 Thread Stephane Bailliez
On Fri, Apr 27, 2012 at 12:07 AM, Phill Tornroth famousactr...@gmail.comwrote:

 I'm using Solr 3.5. Does anyone have a suggestion as to how to end up
 adding this extra dimension so that I can do per-user relevance? It seems
 like an oft-asked, rarely-answered question.


Use a function that make use of your externalfilefield and alter the score
so that you can sort on the score ?


Re: Per-User Sorting on an ExternalFileField

2012-04-26 Thread Phill Tornroth
So, I did just issue:

   sort=sub(my_user_score_field,0)+desc

It got me past the error, but still doesn't appear to be actually using the
values to sort. Any ideas as to why?

Phill

On Thu, Apr 26, 2012 at 4:35 PM, Stephane Bailliez sbaill...@gmail.comwrote:

 On Fri, Apr 27, 2012 at 12:07 AM, Phill Tornroth famousactr...@gmail.com
 wrote:

  I'm using Solr 3.5. Does anyone have a suggestion as to how to end up
  adding this extra dimension so that I can do per-user relevance? It seems
  like an oft-asked, rarely-answered question.
 

 Use a function that make use of your externalfilefield and alter the score
 so that you can sort on the score ?



Re: impact of EdgeNGramFilterFactory on indexing process?

2012-04-26 Thread Erick Erickson
1 yes. EdgeNGram will inevitably increase the number of tokens in
your index, lengthening your index time. How much? some, but that
means you'll have to try it to see if it's unacceptable. Some people
can't take an increase of 10%. Some can take a 100% increase.

2 No. It will increase the number of _tokens_ in each document in
your index, and the index size.  But the number of documents is
unchanged.

Best
Erick

On Thu, Apr 26, 2012 at 12:09 PM, geeky2 gee...@hotmail.com wrote:

 Hello all,

 i am experimenting with EdgeNGramFilterFactory - on two of the fieldTypes in
 my schema.

       filter class=solr.EdgeNGramFilterFactory minGramSize=3
 maxGramSize=15 side=front/

 i believe i understand this - but want to verify:

 1) will this increase my index time?
 2) will increase the number of documents in my index?

 thank you

 --
 View this message in context: 
 http://lucene.472066.n3.nabble.com/impact-of-EdgeNGramFilterFactory-on-indexing-process-tp3941743p3941743.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Question on Facet counts by grouped results

2012-04-26 Thread Erick Erickson
Yes, stopwords_en.txt. Or go into your schema file and find the
usages of stopwords_en.txt and change it to a  stopwords file
in your setup.

Best
Erick

On Thu, Apr 26, 2012 at 5:15 PM, Sohail Aboobaker sabooba...@gmail.com wrote:
 Hi,

 I am trying nightly build for solr 4.0. I downloaded the build and am
 able to start it. In 3.x, I copied the example directory and updated
 the schema.xml. It worked fine but in 4.0, I did the same thing (make
 a copy of example) but when I change the schema, I get following:

 Apr 26, 2012 5:04:12 PM org.apache.solr.common.SolrException log
 SEVERE: null:java.lang.RuntimeException: Can't find resource
 'stopwords_en.txt' in classpath or 'solr/./conf/',
 cwd=/apps/servers/apache-solr-4.0-2012-04-26_08-10-58/apache-solr-4.0-2012-04-26_08-10-58/trusted

 Do i need to copy some other files in my copied directory as well?

 Sohail


Benchmark Solr vs Elastic Search vs Sensei

2012-04-26 Thread Volodymyr Zhabiuk
Hi Solr users

I've implemented the project to compare the performance between
Solr, Elastic Search and SenseiDB
https://github.com/vzhabiuk/search-perf
 the Solr version 3.5.0 was used. I've used the default configuration,
just enabled json updates and used the following schema
https://github.com/vzhabiuk/search-perf/blob/master/configs/solr/schema.xml.
2.5 mln documents were put into the index, after
that I've launched the indexing process to add anotherr 500k docs. I
was issuing commits after each 500 doc batch . At the
same time I've launched the concurrent client, that sent the
following type of queries
((tags:moon-roof%20or%20tags:electric%20or%20tags:highend%20or%20tags:hybrid)%20AND%20(!tags:family%20AND%20!tags:chick%20magnet%20AND%20!tags:soccer%20mom))%20
OR%20((color:red%20or%20color:green%20or%20color:white%20or%20color:yellow)%20AND%20(!color:gold%20AND%20!color:silver%20AND%20!color:black))%20
OR%20mileage:[15001%20TO%2017500]%20OR%20mileage:[17501%20TO%20*]%20
OR%20city:u.s.a.*
facet=truefacet.field=tagsfacet.field=color
The query contains the high level OR query, consisting of 2 terms, 2
ranges and 1 prefix. It is designed to hit ~60-70% of all the docs
Here is the performance result:
#Threads min   median mean75% qps
   1 208.95ms  332.66ms350.48ms 422.92ms 2.8
   2 188.68ms  338.09ms339.22ms 402.15ms 5.9
   3 151.06ms  326.64ms336.20ms 418.61ms 8.8
   4 125.13ms  332.90ms332.18ms 396.14ms 12.0
If there is no  indexing process on background
The result is as follows for 2,6 mln docs:
#Threads min median  mean 75% qps
   1 106.70ms  199.66ms199.40ms 234.89ms 5.1
   2 128.61ms  199.12ms201.81ms 229.89ms 9.9
   3 110.99ms  197.43ms203.13ms 232.25ms 14.7
   4 90.24ms201.46ms  200.46ms 227.75ms 19.9
   5 106.14ms  208.75ms207.69ms 242.88ms 24.0
   6 103.75ms  208.91ms211.23ms 238.60ms 28.3
   7 113.54ms  207.07ms209.69ms 239.99ms 33.3
   8 117.32ms  216.38ms224.74ms 258.74ms 35.5
I've got three questions so far:
1. In case of background indexing the latency is almost 2 times
higher, is there any way to overcome this?
2. How can we tune the Solr to get better results ?
3. What's in your opinion is the preferred type of queries that I can
use for the benchmark?

With many thanks,
Volodymyr


BTW here is the spec of my machine
RedHat 6.1 64bit
Intel XEON e5620 @2.40 GHz, 8 cores
63 GB RAM


Re: HTTP Auth and Distributed Search?

2012-04-26 Thread Lance Norskog
I believe you can set up certificates. You then store the certificates
in a Java keyring file, and tell Java about the keyring at startup.
Now, when you make an HTTP connection, the HTTP library automatically
uses the certificates. You don't need any custom code in the http
client.

(I think this is how it works, anyway.)

On Thu, Apr 26, 2012 at 3:01 PM, Michael Della Bitta
michael.della.bi...@appinions.com wrote:
 Really? Is that in a .properties file somewhere, or would I have to do
 it in code?

 I was sort of hoping I'd be able to add the credentials to the URL in
 the shards field, but looking at the source, that won't fly. While we're
 on the topic, it might be nice to be able to specify the connection
 scheme, too (e.g. for HTTPS).

 I'd be willing to make a patch if there's a decision on the way this
 should work.

 Thanks,

 Michael

 On Thu, 2012-04-26 at 17:55 -0400, Mark Miller wrote:
 On Apr 26, 2012, at 5:25 PM, Michael Della Bitta wrote:

  Hi,
 
  I'm wondering if there's any way to use container-based HTTP auth and
  Distributed Search configured in the SearchHandler that I haven't
  discovered aside from writing my own shard handler implementation.
 
  Thanks,
 
  Michael
 


 I think there is an ugly global way to support this by setting some global 
 properties for HttpClient. I can't remember clearly offhand though.

 We should add explicit support for this I think - just like we have for 
 replication.

 - Mark Miller
 lucidimagination.com
















-- 
Lance Norskog
goks...@gmail.com