str name=stream_sizenull/str when using HttpSolrServer

2012-11-06 Thread sh

Good day,

I recently moved to solrj 3.6.1. As the CommonsHttpSolrServer class is deprecated in that version I 
migrated to HttpSolrServer. But now tika does not generate the stream_size field correctly, it is 
saying in the result response for an arbitrary jpeg file str 
name=stream_sizenull/str. Is there any known way to fix that?

The extract handler is defined as:
  
  solrconfig.xml


  requestHandler name=/update/extract 
class=solr.extraction.ExtractingRequestHandler
lst name=defaults
  str name=lowernamestrue/str
  str name=fmap.ownerfile_owner/str
  str name=fmap.pathfile_path/str
/lst
  /requestHandler

 the field in schema.xml looks like that:

 field name=stream_size type=string indexed=true stored=true 
multiValued=false /

Kind regards,

Silvio


Solr / Velocity url rewrite

2012-11-06 Thread Sébastien Dartigues
Hi all,

Today i'm using solritas as front-end for the solr search engine.

But i would like to do url rewriting to deliver urls more compliant with
SEO.

First the end user types that kind of url : http://host.com/query/myquery

So this url should be rewriten internally (kind of reverse proxy) in
http://localhost:8983/query?q=myquery.

This internal url should not be displayed to the end user and in return
when the result page is displayed all the links in the page should be
rewritten with a SEO compliant url.

I tried to perform some tests with an apache front end by using mod_proxy
but i didn't succeed to pass url parameters.
Does someone ever tried to do SEO with solr search engine (solritas front)?

Thanks for your help.


RE: Continuous Ping query caused exception: java.util.concurrent.RejectedExecutionException

2012-11-06 Thread Markus Jelsma
https://issues.apache.org/jira/browse/SOLR-4037

 
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Sat 03-Nov-2012 14:24
 To: solr-user@lucene.apache.org
 Subject: Re: Continuous Ping query caused exception: 
 java.util.concurrent.RejectedExecutionException
 
 
 On Nov 1, 2012, at 5:39 AM, Markus Jelsma markus.jel...@openindex.io wrote:
 
  File bug?
 
 Please.
 
 - Mark


RE: SolrCloud indexing blocks if node is recovering

2012-11-06 Thread Markus Jelsma

https://issues.apache.org/jira/browse/SOLR-4038
Still trying to gather the logs
 
 
-Original message-
 From:Mark Miller markrmil...@gmail.com
 Sent: Sat 03-Nov-2012 14:17
 To: Markus Jelsma markus.jel...@openindex.io
 Cc: solr-user@lucene.apache.org
 Subject: Re: SolrCloud indexing blocks if node is recovering
 
 The OOM machine and any surrounding if possible (eg especially the leader of 
 the shard).
 
 Not sure what I'm looking for yet, so the more info the better.
 
 - Mark
 
 On Nov 3, 2012, at 5:23 AM, Markus Jelsma markus.jel...@openindex.io wrote:
 
  Hi - yes, i should be able to make sense out of them next monday. I assume 
  you're not too interested in the OOM machine but all surrounding nodes that 
  blocked instead? 
  
  
  
  -Original message-
  From:Mark Miller markrmil...@gmail.com
  Sent: Sat 03-Nov-2012 03:14
  To: solr-user@lucene.apache.org
  Subject: Re: SolrCloud indexing blocks if node is recovering
  
  Doesn't sound right. Still have the logs?
  
  - Mark
  
  On Fri, Nov 2, 2012 at 9:45 AM, Markus Jelsma
  markus.jel...@openindex.io wrote:
  Hi,
  
  We just tested indexing some million docs from Hadoop to a 10 node 2 rep 
  SolrCloud cluster with this week's trunk. One of the nodes gave an OOM 
  but indexing continued without interruption. When i restarted the node 
  indexing stopped completely, the node tried to recover - which was 
  unsuccessful. I restarted the node again but that wasn't very helpful 
  either. Finally i decided to stop the node completely and see what 
  happens - indexing resumed.
  
  Why or how won't the other nodes accept incoming documents when one node 
  behaves really bad? The dying node wasn't the node we were sending 
  documents to and we are not using CloudSolrServer yet (see other thread). 
  Is this known behavior? Is it a bug?
  
  Thanks,
  Markus
  
  
  
  -- 
  - Mark
  
 
 


Re: Where to get more documents or references about sold cloud?

2012-11-06 Thread Lance Norskog
LucidFind is a searchable archive of Solr documentation and email lists:

http://find.searchhub.org/?q=solrcloud

- Original Message -
| From: Jack Krupansky j...@basetechnology.com
| To: solr-user@lucene.apache.org
| Sent: Monday, November 5, 2012 4:44:46 AM
| Subject: Re: Where to get more documents or references about sold cloud?
| 
| Is most of the Web blocked in your location? When I Google
| SolrCloud,
| Google says that there are About 61,400 results with LOTS of
| informative
| links, including blogs, videos, slideshares, etc. just on the first
| two
| pages pf search results alone.
| 
| If you have specific questions, please ask them with specific detail,
| but
| try reading a few of the many sources of information available on the
| Web
| first.
| 
| -- Jack Krupansky
| 
| -Original Message-
| From: SuoNayi
| Sent: Monday, November 05, 2012 3:32 AM
| To: solr-user@lucene.apache.org
| Subject: Where to get more documents or references about sold cloud?
| 
| Hi all, there is only one entry about solr cloud on the
| wiki,http://wiki.apache.org/solr/SolrCloud.
| I have googled a lot and found no more details about solr cloud, or
| maybe I
| miss something?
| 
| 


Re: Does SolrCloud supports MoreLikeThis?

2012-11-06 Thread Lance Norskog
The question you meant to ask is: Does MoreLikeThis support Distributed 
Search? and the answer apparently is no. This is the issue to get it working:

https://issues.apache.org/jira/browse/SOLR-788

(Distributed Search is independent of SolrCloud.) If you want to make unit 
tests, that would really help- they won't work now but they will make it easier 
for someone to get the patch working again. Also, the patch will not get 
committed without unit tests.

Lance

- Original Message -
| From: Luis Cappa Banda luisca...@gmail.com
| To: solr-user@lucene.apache.org
| Sent: Monday, November 5, 2012 7:54:59 AM
| Subject: Re: Does SolrCloud supports MoreLikeThis?
| 
| Thanks for the answer, Darren! I still have the hope that MLT is
| supported
| in the current version. An important feature of the product that I´m
| developing depends on that, and even if I can emulate MLT with a
| Dismax or
| E-dismax component, the thing is that MLT fits and works perfectly...
| 
| Regards,
| 
| Luis Cappa.
| 
| 
| 2012/11/5 Darren Govoni dar...@ontrenet.com
| 
|  There is a ticket for that with some recent activity (sorry I don't
|  have
|  it handy right now), but I'm not sure if that work made it into the
|  trunk,
|  so probably solrcloud does not support MLT...yet. Would love an
|  update from
|  the dev team though!
| 
|  brbrbr--- Original Message ---
|  On 11/5/2012  10:37 AM Luis Cappa Banda wrote:brThat´s the
|  question, :-)
|  br
|  brRegards,
|  br
|  brLuis Cappa.
|  br
| 
| 


GC stalls cause Zookeeper timeout during uninvert for facet field

2012-11-06 Thread Arend-Jan Wijtzes
Hi,

We are running a small solr cluster with 8 cores on 4 machines. This
database has about 1E9 very small documents. One of the statistics we
need requires a facet on a text field with high cardinality.

During the uninvert phase of this text field the searchers experience
long stalls because of the garbage collecting (20+ seconds pauses) which
causes Solr to lose the Zookeeper lease. Often they do not recover 
gracefully and as a result the cluster becomes degraded:

SEVERE: There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props

This is an known open issue.

I explored several options to try and work around this. However I'm new
to Solr and need some help.

We tried running more cores:
We went from 4 to 8 cores. Does it make sense to go to 16 cores on 4
machines?


GC tuning:
This helped a lot but not enough to prevent the lease expirations. I'm
by no means a Java GC expert and would appreciate any tips to improve
this further. Current settings are:

Java HotSpot(TM) 64-Bit Server VM (20.0-b11)
-Xloggc:/home/solr/solr/log/gc.log
-XX:+PrintGCApplicationStoppedTime
-XX:+PrintTenuringDistribution
-XX:+PrintClassHistogram
-XX:+PrintGCTimeStamps
-XX:+PrintGCDetails
-XX:+UseCMSInitiatingOccupancyOnly
-XX:CMSInitiatingOccupancyFraction=75
-XX:MaxGCPauseMillis=1
-XX:+CMSIncrementalMode
-XX:+UseConcMarkSweepGC
-XX:+UseParNewGC
-Djava.awt.headless=true
-Xss256k
-Xmx18g
-Xms1g
-DzkHost=ds30:2181,ds31:2181,ds32:2181

Actual memory stats accoring to top are: 74GB virtual, 11GB resident.
The GC log shows:
- age   1:   39078968 bytes,   39078968 total
: 342633K-38290K(345024K), 24.7992520 secs]
9277535K-9058682K(11687832K) icms_dc=73 , 24.7993810 secs] [Times:
user=366.87 sys=26.31, real=24.79 secs]
Total time for which application threads were stopped: 24.8005790
seconds
975.478: [GC 975.478: [ParNew
Desired survivor size 19628032 bytes, new threshold 1 (max 4)
- age   1:   38277672 bytes,   38277672 total
: 343750K-37537K(345024K), 22.4217640 secs]
9364142K-9131962K(11687832K) icms_dc=73 , 22.4218650 secs] [Times:
user=331.25 sys=23.85, real=22.42 secs]
Total time for which application threads were stopped: 22.4231750
seconds

etc.


Solr version:
4.0.0.2012.10.06.03.04.33

Current hardware consists of 4 machines, of which each has:
2x E5645 CPU, total of 24 cores
48GB mem
8 x SATA 7200RPM in raid 10


What would be a good strategy to try and get this database to perform
the way we need it? Would it make sense to split it up into 16 shards?
Ways to improve the GC behavior?

Any help would be grately appreciated.

AJ

-- 
Arend-Jan Wijtzes -- Wiseguys -- www.wise-guys.nl


Re: SolrCloud - configuration management in ZooKeeper

2012-11-06 Thread Tomás Fernández Löbbe
Hi Alexey, responses are inline:

Zookeeper manages not only the cluster state, but also the common
 configuration files.
 My question is, what are the exact rules of precedence? That is, when SOLR
 node will decide to download new configuration files?

When the SolrCore is started.


 Will configuration files be updated from ZooKeeper every time the core is
 refreshed?

Yes, every time the SolrCore is reloaded. If you need to force this, you
can either reload all the cores or reload the collection:
https://issues.apache.org/jira/browse/SOLR-3488

 What if bootstrapping is defined (bootstrap_configdir)? Will the
 node always try to upload?

if bootstrap_confdir is set, and the config name is always the same, every
time you start Solr it will upload the configuration files and override the
old ones in the same zk location.

 What are the best practices for production environment? Is it better to use
 external tool (ZkCLI) to trigger configuration changes?

I would at least not attach the bootstrap_confdir to a start script and
make it explicit. There are some Solr specific zk scripts that you can use.
See
http://wiki.apache.org/solr/SolrCloud#Getting_your_Configuration_Files_into_ZooKeeper

I would use Solr's zk script for managing the configuration.

Tomás


 Thanks




 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/SolrCloud-configuration-management-in-ZooKeeper-tp4018432.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Solr4 data import skipdoc and regex

2012-11-06 Thread Randy
Hi *,

I want to import some data to build a Solr index. For this import, I need to
skip some documents from importing. In my data-config file it looks like
this:

field column=$skipDoc  regex=^MyPattern .* replaceWith=true
sourceColName=text/

As I also need to search my 'titles' I tried this:

field column=$skipDoc  regex=^MyPattern .* replaceWith=true
sourceColName=text/
field column=$skipDoc  regex=^MyPattern2 .* replaceWith=true
sourceColName=title/
This couldn't work - thats now clear for me ;-) But how can I do it?

Thanks in advance :-)

Randy



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr4-data-import-skipdoc-and-regex-tp4018495.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Add new shard will be treated as replicas in Solr4.0?

2012-11-06 Thread Erick Erickson
bq: where can i find all the items on the road map?

Well, you really can't G... There's no official roadmap. I happen to
know this since I follow the developer's list and I've seen references to
this being important to the folks doing SolrCloud development work and it's
been a recurring theme on the user's list. It's one of those things that
_everybody_ understands would be useful in certain circumstances, but
haven't had time to actually implement yet.

You can track this at: https://issues.apache.org/jira/browse/SOLR-2592

Best
Erick



On Mon, Nov 5, 2012 at 7:57 PM, Zeng Lames lezhi.z...@gmail.com wrote:

 btw, where can i find all the items in the road map? thanks!


 On Tue, Nov 6, 2012 at 8:55 AM, Zeng Lames lezhi.z...@gmail.com wrote:

  hi Erick, thanks for your kindly response. hv got the information from
 the
  SolrCloud wiki.
  think we may need to defined the shard numbers when we really rollout it.
 
  thanks again
 
 
  On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson erickerick...@gmail.com
 wrote:
 
  Not at present. What you're interested in is shard splitting which is
  certainly on the roadmap but not implemented yet. To expand the
  number of shards you'll have to reconfigure, then re-index.
 
  Best
  Erick
 
 
  On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames lezhi.z...@gmail.com
 wrote:
 
   Dear All,
  
   we have an existing solr collection, 2 shards, numOfShard is 2. and
  there
   are already records in the index files. now we start another solr
  instance
   with ShardId= shard3, and found that Solr treat it as replicas.
  
   check the zookeeper data, found the range of shard doesn't
   change correspondingly. shard 1 is 0-7fff, while shard 2 is
   8000-.
  
   is there any way to increase new shard for existing collection?
  
   thanks a lot!
   Lames
  
 
 
 



Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread Erick Erickson
Not that I know of. This would be extremely expensive in the usual case.
Loading up configs, reconfiguring all the handlers etc. would add a huge
amount of overhead to the commit operation, which is heavy enough as it is.

What's the use-case here? Changing your configs really often and reading
them on commit sounds like a way to make for a very confusing application!

But if you really need to re-read all this info on a running system,
consider the core admin RELOAD command.

Best
Erick


On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote:

 Hi All

 I am keen to find out if Solr exposes any event listener or other hooks
 which can be used to re-read configuration files.


 I know that we have firstSearcher event but I am not sure if it causes
 request handlers to reload themselves and read the conf files again.

 For example, if I change the synonym file and solr gets a commit, will it
 re-initialize request handlers and re-read the conf files.

 Or, are there some events which can be listened to?

 Any inputs are welcome.

 Thanks
 Saroj



Searching for Partial Words

2012-11-06 Thread Sohail Aboobaker
Hi,

Given following values in the document:

Doc1: Engine
Doc2. Engineer
Doc3. ResidentEngineer

We need to return all three documents when someone searches for engi.

Basically we need to implement partial word search. Currently, we have a
wild card on the right side of search term (term*). Is it possible to have
wild card on both sides of a search term?

Regards,
Sohail Aboobaker.


Re: load balance with SolrCloud

2012-11-06 Thread Erick Erickson
I think you're conflating shards and cores. Shards are physical slices of a
singe logical index. An incoming query is sent to each and every shard and
the results tallied.

The case you're talking about seems to be more you have N separate indexes
(cores), where each core is for a specific user. This is vastly different
from SolrCloud, which puts all the data into one huge logical index!

Furthermore, presently there's no way to direct specific documents to
specific shards in SolrCloud (although a pluggable sharding mechanism is
under development).

You might be interested in SOLR-1293 (under development) for managing lots
of cores.






On Mon, Nov 5, 2012 at 4:26 PM, Jie Sun jsun5...@yahoo.com wrote:

 we are using solr 3.5 in production and we deal with customers data of
 terabytes.

 we are using shards for large customers and write our own replica
 management
 in our software.

 Now with the rapid growth of data, we are looking into solrcloud for its
 robustness of sharding and replications.

 I understand by read some documents on line that there is no SPOF using
 solrcloud, so any instance in the cluster can server the query/index.
 However, is it true that we need to write our own load balancer in front of
 solrCloud?

 For example if we want to implement a model similar to Loggly, i.e. each
 customer start indexing into the small shard of its own, then if any of the
 customers grow more than the small shard's limit, we switch to index into
 another small shard (we call it front end shard), meanwhile merge the just
 released small shard to next level larger shard.

 Since the merge can happen between two instances on different servers, we
 probably end up with synch the index files for the merging shards and then
 use solr merge.

 I am curious if there is anything solr provide to help on these kind of
 strategy dealing with unevenly grow big customer data (a core)? or do we
 have to write these in our software layer from scratch?

 thanks
 Jie



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Solr / Velocity url rewrite

2012-11-06 Thread Erick Erickson
Velocity/Solaritas was never intended to be a user-facing app. How are you
locking things down so a user can't enter, or instance,
q=deletequery*:*/query/deletecommit=true?

I'd really recommend a proper middleware layer unless you have a trusted
user base...

FWIW,
Erick


On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues 
sebastien.dartig...@gmail.com wrote:

 Hi all,

 Today i'm using solritas as front-end for the solr search engine.

 But i would like to do url rewriting to deliver urls more compliant with
 SEO.

 First the end user types that kind of url : http://host.com/query/myquery

 So this url should be rewriten internally (kind of reverse proxy) in
 http://localhost:8983/query?q=myquery.

 This internal url should not be displayed to the end user and in return
 when the result page is displayed all the links in the page should be
 rewritten with a SEO compliant url.

 I tried to perform some tests with an apache front end by using mod_proxy
 but i didn't succeed to pass url parameters.
 Does someone ever tried to do SEO with solr search engine (solritas front)?

 Thanks for your help.



Re: Solr / Velocity url rewrite

2012-11-06 Thread Sébastien Dartigues
Hi Erick,

Thanks for your help.
OK except the php client delivered as a sample, do you have a preference
for an out of the box front end easly deployable?
My main use case is to be compliant with SEO, or at least to give nice
(url) entry point.

Thanks.


2012/11/6 Erick Erickson erickerick...@gmail.com

 Velocity/Solaritas was never intended to be a user-facing app. How are you
 locking things down so a user can't enter, or instance,
 q=deletequery*:*/query/deletecommit=true?

 I'd really recommend a proper middleware layer unless you have a trusted
 user base...

 FWIW,
 Erick


 On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues 
 sebastien.dartig...@gmail.com wrote:

  Hi all,
 
  Today i'm using solritas as front-end for the solr search engine.
 
  But i would like to do url rewriting to deliver urls more compliant with
  SEO.
 
  First the end user types that kind of url :
 http://host.com/query/myquery
 
  So this url should be rewriten internally (kind of reverse proxy) in
  http://localhost:8983/query?q=myquery.
 
  This internal url should not be displayed to the end user and in return
  when the result page is displayed all the links in the page should be
  rewritten with a SEO compliant url.
 
  I tried to perform some tests with an apache front end by using mod_proxy
  but i didn't succeed to pass url parameters.
  Does someone ever tried to do SEO with solr search engine (solritas
 front)?
 
  Thanks for your help.
 



Re: Searching for Partial Words

2012-11-06 Thread Jack Krupansky
Add an edge n-gram filter (EdgeNGramFilterFactory) to your index 
analyzer. This will add all the prefixes of words to the index, so that a 
query of engi will be equivalent to but much faster than the wildcard 
engi*. You can specify a minimum size, such as 3 or 4 to eliminate tons of 
too-short prefixes, if you want.


See:
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramFilterFactory.html
http://lucene.apache.org/core/4_0_0/analyzers-common/org/apache/lucene/analysis/ngram/EdgeNGramTokenFilter.html

-- Jack Krupansky

-Original Message- 
From: Sohail Aboobaker

Sent: Tuesday, November 06, 2012 8:08 AM
To: solr-user@lucene.apache.org
Subject: Searching for Partial Words

Hi,

Given following values in the document:

Doc1: Engine
Doc2. Engineer
Doc3. ResidentEngineer

We need to return all three documents when someone searches for engi.

Basically we need to implement partial word search. Currently, we have a
wild card on the right side of search term (term*). Is it possible to have
wild card on both sides of a search term?

Regards,
Sohail Aboobaker. 



Re: Solr 4.0 simultaneous query problem

2012-11-06 Thread Rohit Harchandani
So is it a better approach to query for smaller rows, say 500, and keep
increasing the start parameter? wouldnt that be slower since I have an
increasing start parameter and I will also be sorting by the same field in
each of my queries made to the multiple shards?

Also, does it make sense to have all these documents in the same shard? I
went for this approach because the shard which is queried the most is small
and gives a lot of benefit in terms of time taken for all the stats
queries. This shard is only about 5 gb whereas the entire index will be
about 50 gb.

Thanks for the help,
Rohit

On Mon, Nov 5, 2012 at 4:02 PM, Walter Underwood wun...@wunderwood.orgwrote:

 Don't query for 5000 documents. That is going to be slow no matter how it
 is implemented.

 wunder

 On Nov 5, 2012, at 1:00 PM, Rohit Harchandani wrote:

  Hi,
  So it seems that when I query multiple shards with the sort criteria for
  5000 documents, it queries all shards and gets a list of document ids and
  then adds the document ids to the original query and queries all the
 shards
  again.
  This process of doing the join of query results with the unique ids and
  getting the remaining fields is turning out to be really slow. It takes a
  while to search for a list of unique ids. Is there any config change  to
  make this process faster?
  Also what does isDistrib=false mean when solr generates the queries
  internally?
  Thanks,
  Rohit
 
  On Fri, Oct 19, 2012 at 5:23 PM, Rohit Harchandani rhar...@gmail.com
 wrote:
 
  Hi,
 
  The same query is fired always for 500 rows. The only thing different is
  the start parameter.
 
  The 3 shards are in the same instance on the same server. They all have
  the same schema. But the inherent type of the documents is different.
 Also
  most of the apps queries goes to shard A which has the smallest index
  size (4gb).
 
  The query is made to a master shard which by default goes to all 3
  shards for results. (also, the query that i am trying matches documents
  only only in shard A mentioned above)
 
  Will try debugQuery now and post it here.
 
  Thanks,
  Rohit
 
 
 
 
  On Thu, Oct 18, 2012 at 11:00 PM, Otis Gospodnetic 
  otis.gospodne...@gmail.com wrote:
 
  Hi,
 
  Maybe you can narrow this down a little further.  Are there some
  queries that are faster and some slower?  Is there a pattern?  Can you
  share examples of slow queries?  Have you tried debugQuery=true?
  These 3 shards is each of them on its own server or?  Is the slow
  one always the one that hits the biggest shard?  Do they hold the same
  type of data?  How come their sizes are so different?
 
  Otis
  --
  Search Analytics - http://sematext.com/search-analytics/index.html
  Performance Monitoring - http://sematext.com/spm/index.html
 
 
  On Thu, Oct 18, 2012 at 12:22 PM, Rohit Harchandani rhar...@gmail.com
 
  wrote:
  Hi all,
  I have an application which queries a solr instance having 3
 shards(4gb,
  13gb and 30gb index size respectively) having 6 million documents in
  all.
  When I start 10 threads in my app to make simultaneous queries (with
  rows=500 and different start parameter, sort on 1 field and no facets)
  to
  solr to return 500 different documents in each query, sometimes I see
  that
  most of the responses come back within no time (500ms-1000ms), but the
  last
  response takes close to 50 seconds (Qtime).
  I am using the latest 4.0 release. What is the reason for this delay?
 Is
  there a way to prevent this?
  Thanks and regards,
  Rohit
 
 
 

 --
 Walter Underwood
 wun...@wunderwood.org






Re: SolrCloud failover behavior

2012-11-06 Thread Nick Chase
Thanks a million, Erick!  You're right about killing both nodes hosting 
the shard.  I'll get the wiki corrected.


  Nick

On 11/3/2012 10:51 PM, Erick Erickson wrote:

SolrCloud doesn't work unless every shard has at least one server that is
up and running.

I _think_ you might be killing both nodes that host one of the shards. The
admin
page has a link showing you the state of your cluster. So when this happens,
does that page show both nodes for that shard being down?

And yeah, SolrCloud requires a quorum of ZK nodes up. So with only one ZK
node, killing that will bring down the whole cluster. Which is why the
usual
recommendation is that ZK be run externally and usually an odd number of ZK
nodes (three or more).

Anyone can create a login and edit the Wiki, so any clarifications are
welcome!

Best
Erick


On Sat, Nov 3, 2012 at 12:17 PM, Nick Chase nch...@earthlink.net wrote:


I think there's a change in the behavior of SolrCloud vs. what's in the
wiki, but I was hoping someone could confirm for me.  I checked JIRA and
there were a couple of issues requesting partial results if one server
comes down, but that doesn't seem to be the issue here.  I also checked
CHANGES.txt and don't see anything that seems to apply.

I'm running Example B: Simple two shard cluster with shard replicas from
the wiki at 
https://wiki.apache.org/solr/**SolrCloudhttps://wiki.apache.org/solr/SolrCloudand
 everything starts out as expected.  However, when I get to the part
about fail over behavior is when things get a little wonky.

I added data to the shard running on 7475.  If I kill 7500, a query to any
of the other servers works fine.  But if I kill 7475, rather than getting
zero results on a search to 8983 or 8900, I get a 503 error:

response
lst name=responseHeader
   int name=status503/int
   int name=QTime5/int
   lst name=params
  str name=q*:*/str
   /lst
/lst
lst name=error
   str name=msgno servers hosting shard:/str
   int name=code503/int
/lst
/response

I don't see any errors in the consoles.

Also, if I kill 8983, which includes the Zookeeper server, everything
dies, rather than just staying in a steady state; the other servers
continually show:

Nov 03, 2012 11:39:34 AM org.apache.zookeeper.**ClientCnxn$SendThread
startConnect
NFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983
ov 03, 2012 11:39:35 AM org.apache.zookeeper.**ClientCnxn$SendThread run
ARNING: Session 0x13ac6cf87890002 for server null, unexpected error,
closing socket connection and attempting reconnect
ava.net.ConnectException: Connection refused: no further information
at sun.nio.ch.SocketChannelImpl.**checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.**finishConnect(Unknown Source)
at org.apache.zookeeper.**ClientCnxn$SendThread.run(**
ClientCnxn.java:1143)

ov 03, 2012 11:39:35 AM org.apache.zookeeper.**ClientCnxn$SendThread
startConnect

over and over again, and a call to any of the servers shows a connection
error to 8983.

This is the current 4.0.0 release, running on Windows 7.

If this is the proper behavior and the wiki needs updating, fine; I just
need to know.  Otherwise if anybody has any clues as to what I may be
missing, I'd be grateful. :)

Thanks...

---  Nick





Re: lukeall.jar for Solr4r?

2012-11-06 Thread Carrie Coy
Thank you very much for taking the time to do this.   This version is 
able to read the index files, but there is at least one issue:


The home screen reports ERROR: can't count terms per field and  this 
exception is thrown:


java.util.NoSuchElementException
at java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098)
at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)

at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64)
at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109)
at org.getopt.luke.Luke$3.run(Luke.java:1165)


On 11/05/2012 05:08 PM, Shawn Heisey wrote:

On 11/5/2012 2:52 PM, Shawn Heisey wrote:
No idea whether I did it right, or even whether it works.  All my 
indexes are either 3.5 or 4.1-SNAPSHOT, so I can't actually test it.  
You can get to the resulting jar and my patch against the 
luke-4.0.0-ALPHA source:


https://dl.dropbox.com/u/97770508/luke-4.0.0-unofficial.patch
https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial.jar

If you have an immediate need for 4.0.0 support in Luke, please try 
it out and let me know whether it works.  If it doesn't work, or when 
the official luke 4.0.0 is released, I will remove those files from 
my dropbox.


I just realized that the version I uploaded there was compiled with 
java 1.7.0_09.  I don't know if this is actually a problem, but just 
in case, I re-did the compile on a machine with 1.6.0_29.  The 
filename referenced above now points to this version and I have 
included a file that indicates its java7 origins:


https://dl.dropbox.com/u/97770508/lukeall-4.0.0-unofficial-java7.jar

Thanks,
Shawn



custom request handler

2012-11-06 Thread Lee Carroll
Hi we are extending SearchHandler to provide a custom search request
handler. Basically we've added NamedLists called allowed , whiteList,
maxMinList etc.

These look like the default, append and invariant namedLists in the
standard search handler config. In handleRequestBody we then remove params
not listed in the allowed named list, white list values as per the white
list and so on.

The idea is to have a safe request handler which the big bad world could
be exposed to. I'm worried. What have we missed that a front end app could
give us ?

Also removing params in SolrParams is a bit clunky. We are basically
converting SolrParams into NamedList processing a new NamedList from this
and then .setParams(SolrParams.toSolrParams(nlNew)) Is their a better way?
In particular namedLists are not set up for key look ups...

Anyway basically is having a custom request handler doing the above the way
to go ?

Cheers


Re: Searching for Partial Words

2012-11-06 Thread Sohail Aboobaker
Thanks Jack.
In the configuration below:

 fieldType name=text_edgngrm class=solr.TextField
positionIncrementGap=100
   analyzer
 tokenizer class=solr.EdgeNGramTokenizerFactory side=front
minGramSize=1 maxGramSize=1/
   /analyzer
 /fieldType

What are the possible values for side?

If I understand it correctly, minGramSize=3 and side=front, will
include eng* but not en*. Is this correct? So, the minGramSize is for
number of characters allowed in the specified side.

Does it allow side=both :) or something similar?

Regards,
Sohail


Re: migrating from solr3 to solr4

2012-11-06 Thread Michael Della Bitta
 I got the following error in browser console:
 http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json

We can't see the contents of that link.. Could you post it on
pastebin.com or something?

Michael Della Bitta


Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271

www.appinions.com

Where Influence Isn’t a Game


On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
caarl...@gmail.com wrote:
 I
 got the following error in browser console:

 http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json


Re: migrating from solr3 to solr4

2012-11-06 Thread Carlos Alexandro Becker
Hi Michael, thank for your answer.

I already posted it in stackoverflow (
http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4 ),
but, this looks like a encoding issue, actually, is exactly the error.

I'm not sure, but I look in all xml files in my JBoss and also in app,
neither mention this variables (contextPath and adminPath) related to solr.

So, or there is something that I should configure and don't know how, or
some trouble with the encoding that are escaping the $ and { around the
var (not sure, I didn't find the file where the app variable is populated).

Thanks in advance.




On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta 
michael.della.bi...@appinions.com wrote:

  I got the following error in browser console:
  http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json

 We can't see the contents of that link.. Could you post it on
 pastebin.com or something?

 Michael Della Bitta

 
 Appinions
 18 East 41st Street, 2nd Floor
 New York, NY 10017-6271

 www.appinions.com

 Where Influence Isn’t a Game


 On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
 caarl...@gmail.com wrote:
  I
  got the following error in browser console:
 
  http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json




-- 
Atenciosamente,
*Carlos Alexandro Becker*
https://profiles.google.com/caarlos0


Re: SolrCloud Tomcat configuration: problems and doubts.

2012-11-06 Thread Luis Cappa Banda
Forward to solr-user mailing list. We forgot to reply to it, :-/

2012/11/5 Luis Cappa Banda luisca...@gmail.com

 Hello, Mark!

 I´ve been testing more and more and things are going better. I have tested
 what you told me about -Dbootstrap_conf=true and works fine, but the
 problem is that if I include that application parameter in every Tomcat
 instance when I deploy all Solr servers each one load again all solrCore
 configurations inside Zookeeper.

 It should exists something like a Tomcat master server which only has the
 following parameters that defines the basic SolrCloud configuration:

 JAVA_OPTS=-DzkHost=127.0.0.1:9000 -DnumShards=2 -Dbootstrap_conf=true

 Then the other Tomcat servers should have only:

 JAVA_OPTS=-DzkHost=127.0.0.1:9000


 However, I think that is not the best way to procceed. We are at 2012,
 it´s the end of the world - God (well, one of them) is angry and attacks my
 Production environment. Imagine that all servers go down and a Monit
 service restarts them alleatory. Maybe one common Tomcat server finishes
 it´s startup faster than the named Tomcat master server, so those SolrCloud
 configuration parameters won´t be loaded at first. That´s a problem.

 One posibility is to write a simple script to be executed in every Tomcat
 launch execution that consists on something like:

  I´m the first Tomcat and I´m launching! I´ll write a
 solrcloud.config.lock file in a well-known path (or maybe into Zookeeper)
 to announce the other Tomcats that I´ll start to load SolrCloud
 configuration files into Zookeeper. I am the Tomcat master server, so I´ll
 load* JAVA_OPTS=-DzkHost=127.0.0.1:9000 -DnumShards=2
 -Dbootstrap_conf=true* .

  I´m a second Tomcat and I´m launching! First I check if any
 solrcloud.config.lock file exists. If exists, I simple load *
 JAVA_OPTS=-DzkHost=127.0.0.1:9000* 


 And so on.



 I don´t like too much this solution because it´s not elegant and it´s very
 ad-hoc, but it works. What do you think about it? I´ve just started with
 SolrCloud four or five days ago and maybe I forget something that could
 solve this problem.

 Thank you very much, Mark.

 Regards,

 Luis Cappa.



 2012/11/3 Mark Miller markrmil...@gmail.com

 On Fri, Nov 2, 2012 at 9:05 AM, Luis Cappa Banda luisca...@gmail.com
 wrote:
  Hello, Mark!
 
  How are you? Thanks a lot for helping me. You were right about
 jetty.host
  parameter. My fianl test solr.xml looks like:
 
cores adminPath=/admin/cores defaultCoreName=items_en
  host=localhost hostPort=9080 hostContext=items_en
  core name=items_en instanceDir=items_en /
/cores
 
 
  I´ve noticed that 'hostContext' parameter was also required, so I
 included
  it.

 It should default to /solr if you don't set it - it is there in case
 you deploy to a different context though.

 After that corrections Cloud graph tree looks right, and executing
  queries doesn' t return a 503 error. Phew! However, I checked in the
 Cloud
  graph tree that acollection1 appears too pointing to
  http://localhost:8983/solr. I will continue testing if I missed
 something,
  but looks like it is creating another collection with default parameters
  (collection name, port) without control.

 It should only create what it finds in solr.xml - let me know what you
 find.

 
  While using Apache Tomcat I was forced to include in catalina.sh (or
  setenv.sh) the following environment parameters, as I told you before:
 
  JAVA_OPTS=-DzkHost=127.0.0.1:9000 -Dcollection.configName=items_en

 You should only need -DzkHost= - see below.

 
 
  Just three questions more:
 
  1. That´s a problem for me, because I would like to deploy in each
 Tomcat
  instance more than one Solr server with different configurations file (I
  mean, differents configName parameters), so including that JAVA_OPTS
 forces
  to me to deploy in that Tomcat server only Solr servers with this kind
 of
  configuration. In a production environment I would like to deploy in a
  single Tomcat instance at least for Solr servers, one per each kind of
  documents that I will index and query to. Do you know any way to
 configure
  the configName per each Solr server instance? Is it posible to
 configure it
  inside solr.xml file? Also, it make sense to deploy in each Solr server
 a
  multi-core configuration, each core with each configName allocated in
  Zookeeper, but again using that kind of JAVA_OPTS on-fire params
  configuration makes it impossible, :-(

 That config name sys prop is not being used here - it's only used when
 you use -Dbootstrap_confdir=path, and then only the first time you
 start up.

 Collections are linked to configuration sets in ZooKeeper. If you use
 -Dboostrap_conf=true, a special rule is used that auto links
 collections and config sets with the same name as the collection.
 Otherwise, you can use the ZkCLi cmd line tool to link any collectio
 to any config in zookeeper.



 
  2. The other question is about indexing. What is the best way to plain
 index
  (I 

Re: migrating from solr3 to solr4

2012-11-06 Thread Stefan Matheis
Hey Carlos

just had a quick look at our changes and figured out the revision which 
introduced this change, which might help you while having another look?

http://svn.apache.org/viewvc?view=revisionrevision=1297578

The LoadAdminUiServlet is responsible for replacing those placeholders which 
are causing your problems

HTH at least a bit
Stefan



On Tuesday, November 6, 2012 at 5:02 PM, Carlos Alexandro Becker wrote:

 just found this in the admin.html head:
  
 https://gist.github.com/4025669
  
  
 On Tue, Nov 6, 2012 at 1:57 PM, Carlos Alexandro Becker
 caarl...@gmail.com (mailto:caarl...@gmail.com)wrote:
  
  Hi Michael, thank for your answer.
   
  I already posted it in stackoverflow (
  http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4 ),
  but, this looks like a encoding issue, actually, is exactly the error.
   
  I'm not sure, but I look in all xml files in my JBoss and also in app,
  neither mention this variables (contextPath and adminPath) related to solr.
   
  So, or there is something that I should configure and don't know how, or
  some trouble with the encoding that are escaping the $ and { around the
  var (not sure, I didn't find the file where the app variable is populated).
   
  Thanks in advance.
   
   
   
   
  On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta 
  michael.della.bi...@appinions.com 
  (mailto:michael.della.bi...@appinions.com) wrote:
   
I got the following error in browser console:
   http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json

   We can't see the contents of that link.. Could you post it on
   pastebin.com (http://pastebin.com) or something?

   Michael Della Bitta

   
   Appinions
   18 East 41st Street, 2nd Floor
   New York, NY 10017-6271

   www.appinions.com (http://www.appinions.com)

   Where Influence Isn’t a Game


   On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
   caarl...@gmail.com (mailto:caarl...@gmail.com) wrote:
I
got the following error in browser console:


   http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
   
   
   
   
  --
  Atenciosamente,
  *Carlos Alexandro Becker*
  https://profiles.google.com/caarlos0
  
  
  
  
 --  
 Atenciosamente,
 *Carlos Alexandro Becker*
 https://profiles.google.com/caarlos0





Re: lukeall.jar for Solr4r?

2012-11-06 Thread Shawn Heisey

On 11/6/2012 7:45 AM, Carrie Coy wrote:
Thank you very much for taking the time to do this.   This version is 
able to read the index files, but there is at least one issue:


The home screen reports ERROR: can't count terms per field and this 
exception is thrown:


java.util.NoSuchElementException
at 
java.util.TreeMap$PrivateEntryIterator.nextEntry(TreeMap.java:1098)

at java.util.TreeMap$KeyIterator.next(TreeMap.java:1154)
at 
java.util.Collections$UnmodifiableCollection$1.next(Collections.java:1010)

at org.getopt.luke.IndexInfo.countTerms(IndexInfo.java:64)
at org.getopt.luke.IndexInfo.getNumTerms(IndexInfo.java:109)
at org.getopt.luke.Luke$3.run(Luke.java:1165)


That particular change, around IndexInfo.java line 64 (and a few other 
locations as well), is the one part of my changes that I actually had 
confidence in.  I have no idea how to fix it.  I'll go ahead and remove 
the jars from my dropbox, since they don't work.


Thanks,
Shawn



Re: GC stalls cause Zookeeper timeout during uninvert for facet field

2012-11-06 Thread Gil Tene
On Nov 6, 2012 at 6:06 AM, Arend-Jan Wijtzes 
ajwyt...@wise-guys.nlmailto:ajwyt...@wise-guys.nl wrote:
...
During the uninvert phase of this text field the searchers experience
long stalls because of the garbage collecting (20+ seconds pauses) which
causes Solr to lose the Zookeeper lease. Often they do not recover
gracefully and as a result the cluster becomes degraded:

SEVERE: There was a problem finding the leader in
zk:org.apache.solr.common.SolrException: Could not get leader props

This is an known open issue.

warning: commercial product mention follows

Using the Zing JVM is simple, immediate way to get around this and other known 
GC related issues. Zing eliminates GC pauses as a concern for enterprise 
applications such as this, driving worst case JVM-related hiccups down to the 
milliseconds level. This behavior will tend to happen out-of-the-box, with 
little or no tuning, and at any heap size your server can support. For example, 
on the specific serverconfigurations you mention (24 vcores, 48GB of RAM) you 
should be able to comfortably run with a -Xmx of 30GB and no longer worry about 
pauses. We've had people run much larger than that (e.g. 
http://blog.mikemccandless.com/2012/07/lucene-index-in-ram-with-azuls-zing-jvm.html).

In full disclosure, I work for (and am the CTO at) Azul.

-- Gil.


Re: migrating from solr3 to solr4

2012-11-06 Thread Carlos Alexandro Becker
Hi Stefan,

Thank you very much, I just realized that I didn't updated the web.xml, so,
I not has the LoadAdminUiServlet configured, that's why it was not working.

By now, the only problem I still have, is that it tries to access
solr.home/collection1/conf, and I used to have it in solr.home/conf..

How can I fix this?

Thank you very much for your help.


On Tue, Nov 6, 2012 at 3:01 PM, Stefan Matheis matheis.ste...@gmail.comwrote:

 Hey Carlos

 just had a quick look at our changes and figured out the revision which
 introduced this change, which might help you while having another look?

 http://svn.apache.org/viewvc?view=revisionrevision=1297578

 The LoadAdminUiServlet is responsible for replacing those placeholders
 which are causing your problems

 HTH at least a bit
 Stefan



 On Tuesday, November 6, 2012 at 5:02 PM, Carlos Alexandro Becker wrote:

  just found this in the admin.html head:
 
  https://gist.github.com/4025669
 
 
  On Tue, Nov 6, 2012 at 1:57 PM, Carlos Alexandro Becker
  caarl...@gmail.com (mailto:caarl...@gmail.com)wrote:
 
   Hi Michael, thank for your answer.
  
   I already posted it in stackoverflow (
  
 http://stackoverflow.com/questions/13236383/migrating-from-solr3-to-solr4),
   but, this looks like a encoding issue, actually, is exactly the error.
  
   I'm not sure, but I look in all xml files in my JBoss and also in app,
   neither mention this variables (contextPath and adminPath) related to
 solr.
  
   So, or there is something that I should configure and don't know how,
 or
   some trouble with the encoding that are escaping the $ and {
 around the
   var (not sure, I didn't find the file where the app variable is
 populated).
  
   Thanks in advance.
  
  
  
  
   On Tue, Nov 6, 2012 at 1:49 PM, Michael Della Bitta 
   michael.della.bi...@appinions.com (mailto:
 michael.della.bi...@appinions.com) wrote:
  
 I got the following error in browser console:
   
 http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
   
We can't see the contents of that link.. Could you post it on
pastebin.com (http://pastebin.com) or something?
   
Michael Della Bitta
   

Appinions
18 East 41st Street, 2nd Floor
New York, NY 10017-6271
   
www.appinions.com (http://www.appinions.com)
   
Where Influence Isn’t a Game
   
   
On Tue, Nov 6, 2012 at 8:35 AM, Carlos Alexandro Becker
caarl...@gmail.com (mailto:caarl...@gmail.com) wrote:
 I
 got the following error in browser console:
   
   
   
 http://localhost:8080/indexer/$%7BcontextPath%7D$%7BadminPath%7D?wt=json
  
  
  
  
   --
   Atenciosamente,
   *Carlos Alexandro Becker*
   https://profiles.google.com/caarlos0
 
 
 
 
  --
  Atenciosamente,
  *Carlos Alexandro Becker*
  https://profiles.google.com/caarlos0






-- 
Atenciosamente,
*Carlos Alexandro Becker*
http://caarlos0.github.com/about


Re: Reply:Re: Where to get more documents or references about sold cloud?

2012-11-06 Thread Otis Gospodnetic
Hi,

On Mon, Nov 5, 2012 at 8:24 PM, SuoNayi suonayi2...@163.com wrote:

 Thanks jack and thanks for the great country.
 All big famous websites such as google, slideshares and blogspot etc are
 blocked.
 What I want to know about is more details about solrcloud, here is my
 questions:
 1.Can we control the relocation of shard / replica dynamically?


Don't think so, if you think manually.


 2.Can we move shard between solr instances?


SolrCloud does this, there is no manual moving option now.


 3.Is one solr instance related to one shard / replica?


A single shard or a single replica of a shard lives on just 1 Solr server.
 A replica of a shard on server A can and should be on server B.

4.What's the sharding key algorithm?


Hashing on the doc key and # of nodes, I believe.


 5.Does it support custom sharding key?


Not yet, I believe.

See
http://search-lucene.com/?q=cloud+shardingfc_project=Solrfc_type=mail+_hash_+userfc_type=jira

Otis
--
Search Analytics - http://sematext.com/search-analytics/index.html
Performance Monitoring - http://sematext.com/spm/index.html





 At 2012-11-05 20:44:46,Jack Krupansky j...@basetechnology.com wrote:
 Is most of the Web blocked in your location? When I Google SolrCloud,
 Google says that there are About 61,400 results with LOTS of informative
 links, including blogs, videos, slideshares, etc. just on the first two
 pages pf search results alone.
 
 If you have specific questions, please ask them with specific detail, but
 try reading a few of the many sources of information available on the Web
 first.
 
 -- Jack Krupansky
 
 -Original Message-
 From: SuoNayi
 Sent: Monday, November 05, 2012 3:32 AM
 To: solr-user@lucene.apache.org
 Subject: Where to get more documents or references about sold cloud?
 
 Hi all, there is only one entry about solr cloud on the
 wiki,http://wiki.apache.org/solr/SolrCloud.
 I have googled a lot and found no more details about solr cloud, or maybe
 I
 miss something?
 



New Index directory regardless of Solr.xml

2012-11-06 Thread Rasmussen, Chris
I have a five node SolrCloud implementation running as a test with no 
replication using a three node zookeeper ensemble.  Admittedly, I'm new to Solr 
and just grinding it out.  Accidently re-initialized zookeeper with the wrong 
conf dir and I'm trying to recover.  I re-ran the initialization with the 
correct conf dir, but now the indexes are reporting 0 documents.  Logs also 
report that a new index was created in the dataDir called index. Previous 
indexes where in a named directory based on slice/shard.  The previous indexes 
don't appear to have any issues, I just can't re-point the solr cores to 
them.  The Solr.xml file for one of the servers is:

?xml version=1.0 encoding=UTF-8 ?
solr persistent=true
  cores adminPath=/admin/cores hostPort=8502
core schema=schema.xml shard=slice5 instanceDir=test_s5s1/ 
name=twitter_s5s1 config=solrconfig.xml collection=test/
  /cores
/solr

I think I'm missing exactly what the instanceDir provides.  These were the 
directories created when I first set up the servers and where the indexes exist 
that I want to use.

Any thought? Or am I just completely off base here in my description of the 
issue.

Chris


RE: Access DIH from inside application (via Solrj)?

2012-11-06 Thread Dyer, James
DIH  SolrJ don't really support what you want to do.  But you can make it work 
with code like this, which reloads the DIH configuration and checks for the 
response.  Just note this is quite brittle:  whenever the response changes in 
future versions of DIH, it'll break your code.

MapString, String paramMap = new HashMapString, String();
paramMap.put(command, reload-config);
SolrParams params = new MapSolrParams(paramMap);
DirectXmlRequest req = new DirectXmlRequest(/dataimporthandler, null);

req.setMethod(METHOD.GET);
req.setParams(params);
NamedListObject nl = server.request(req);
String importResponse = (String) nl.get(importResponse);
boolean reloaded = false;
if(Configuration Re-loaded sucessfully.equals(importResponse)) {
reloaded = true;
}

James Dyer
E-Commerce Systems
Ingram Content Group
(615) 213-4311


-Original Message-
From: Billy Newman [mailto:newman...@gmail.com] 
Sent: Tuesday, November 06, 2012 3:00 PM
To: solr-user@lucene.apache.org
Subject: Access DIH from inside application (via Solrj)?

I know that you can access the DIh interface restfully which work
pretty well for most cases.  I would like to know however if it is
possible to send/receive commands from a DIH vai the Solrj library.
Basically I would just like to be able to kick off the DIH and maybe
check status.  I can work around this but java is not the best client
for handling/dealing with http/xml.  If I could use Solrj the code
would probably be a lot more straight forward.

Thanks guys/gals,

Billy




Re: Solr / Velocity url rewrite

2012-11-06 Thread Erick Erickson
Not really. Mostly it's whatever you are most comfortable with. Since the
app - solr connection is just HTTP, the front-end is wide open.

FWIW,
Erick


On Tue, Nov 6, 2012 at 8:30 AM, Sébastien Dartigues 
sebastien.dartig...@gmail.com wrote:

 Hi Erick,

 Thanks for your help.
 OK except the php client delivered as a sample, do you have a preference
 for an out of the box front end easly deployable?
 My main use case is to be compliant with SEO, or at least to give nice
 (url) entry point.

 Thanks.


 2012/11/6 Erick Erickson erickerick...@gmail.com

  Velocity/Solaritas was never intended to be a user-facing app. How are
 you
  locking things down so a user can't enter, or instance,
  q=deletequery*:*/query/deletecommit=true?
 
  I'd really recommend a proper middleware layer unless you have a trusted
  user base...
 
  FWIW,
  Erick
 
 
  On Tue, Nov 6, 2012 at 4:20 AM, Sébastien Dartigues 
  sebastien.dartig...@gmail.com wrote:
 
   Hi all,
  
   Today i'm using solritas as front-end for the solr search engine.
  
   But i would like to do url rewriting to deliver urls more compliant
 with
   SEO.
  
   First the end user types that kind of url :
  http://host.com/query/myquery
  
   So this url should be rewriten internally (kind of reverse proxy) in
   http://localhost:8983/query?q=myquery.
  
   This internal url should not be displayed to the end user and in return
   when the result page is displayed all the links in the page should be
   rewritten with a SEO compliant url.
  
   I tried to perform some tests with an apache front end by using
 mod_proxy
   but i didn't succeed to pass url parameters.
   Does someone ever tried to do SEO with solr search engine (solritas
  front)?
  
   Thanks for your help.
  
 



Re: SolrCloud failover behavior

2012-11-06 Thread Erick Erickson
I was right for once G..

Thanks for updating the Wiki!

Erick


On Tue, Nov 6, 2012 at 9:42 AM, Nick Chase nch...@earthlink.net wrote:

 Thanks a million, Erick!  You're right about killing both nodes hosting
 the shard.  I'll get the wiki corrected.

   Nick


 On 11/3/2012 10:51 PM, Erick Erickson wrote:

 SolrCloud doesn't work unless every shard has at least one server that is
 up and running.

 I _think_ you might be killing both nodes that host one of the shards. The
 admin
 page has a link showing you the state of your cluster. So when this
 happens,
 does that page show both nodes for that shard being down?

 And yeah, SolrCloud requires a quorum of ZK nodes up. So with only one ZK
 node, killing that will bring down the whole cluster. Which is why the
 usual
 recommendation is that ZK be run externally and usually an odd number of
 ZK
 nodes (three or more).

 Anyone can create a login and edit the Wiki, so any clarifications are
 welcome!

 Best
 Erick


 On Sat, Nov 3, 2012 at 12:17 PM, Nick Chase nch...@earthlink.net wrote:

  I think there's a change in the behavior of SolrCloud vs. what's in the
 wiki, but I was hoping someone could confirm for me.  I checked JIRA and
 there were a couple of issues requesting partial results if one server
 comes down, but that doesn't seem to be the issue here.  I also checked
 CHANGES.txt and don't see anything that seems to apply.

 I'm running Example B: Simple two shard cluster with shard replicas
 from
 the wiki at 
 https://wiki.apache.org/solr/SolrCloudhttps://wiki.apache.org/solr/**SolrCloud
 https://wiki.**apache.org/solr/SolrCloudhttps://wiki.apache.org/solr/SolrCloudand
 everything starts out as expected.  However, when I get to the part

 about fail over behavior is when things get a little wonky.

 I added data to the shard running on 7475.  If I kill 7500, a query to
 any
 of the other servers works fine.  But if I kill 7475, rather than getting
 zero results on a search to 8983 or 8900, I get a 503 error:

 response
 lst name=responseHeader
int name=status503/int
int name=QTime5/int
lst name=params
   str name=q*:*/str
/lst
 /lst
 lst name=error
str name=msgno servers hosting shard:/str
int name=code503/int
 /lst
 /response

 I don't see any errors in the consoles.

 Also, if I kill 8983, which includes the Zookeeper server, everything
 dies, rather than just staying in a steady state; the other servers
 continually show:

 Nov 03, 2012 11:39:34 AM org.apache.zookeeper.ClientCnxn$SendThread

 startConnect
 NFO: Opening socket connection to server localhost/0:0:0:0:0:0:0:1:9983
 ov 03, 2012 11:39:35 AM org.apache.zookeeper.ClientCnxn$SendThread
 run

 ARNING: Session 0x13ac6cf87890002 for server null, unexpected error,
 closing socket connection and attempting reconnect
 ava.net.ConnectException: Connection refused: no further information
 at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
 at sun.nio.ch.SocketChannelImpl.finishConnect(Unknown
 Source)
 at org.apache.zookeeper.ClientCnxn$SendThread.run(**
 ClientCnxn.java:1143)

 ov 03, 2012 11:39:35 AM org.apache.zookeeper.ClientCnxn$SendThread

 startConnect

 over and over again, and a call to any of the servers shows a connection
 error to 8983.

 This is the current 4.0.0 release, running on Windows 7.

 If this is the proper behavior and the wiki needs updating, fine; I just
 need to know.  Otherwise if anybody has any clues as to what I may be
 missing, I'd be grateful. :)

 Thanks...

 ---  Nick





Re: load balance with SolrCloud

2012-11-06 Thread Erick Erickson
This is a complex setup, all right.

A pluggable sharding strategy is definitely something that is on the
roadmap for SolrCloud, but hasn't made it into the code base yet.

Keep in mind, though, that all the SolrCloud goodness centers around the
idea of a single index that may be sharded. I don't think SolrCloud has had
time to really think about handling the situation in which you have a bunch
of cores that may or may not be sharded but are running on the same server.
I don't know that it _doesn't_ work, mind you, but that scenario doesn't
seem like the prime use-case for SolrCloud.

That said, I don't know that such a situation is  _not_ do-able in
SolrCloud. Mostly I haven't explored that kind of functionality yet.

Not much help, I know. I suspect that this is one of those cases where _we_
will learn from _you_ if you try to meld SolrCloud with your setup. Sounds
like a great Wiki page if you do pursue this!


Best
Erick


On Tue, Nov 6, 2012 at 4:58 PM, Jie Sun jsun5...@yahoo.com wrote:

 Hi Eric,
 thanks for your information. I read all the related issues with SOLR-1293
 as
 your just pointed me to.

 It seems they are not very suitable for our scenario.

 We do have couple of hundreds cores (you are right each customer will be
 corresponded to a core) typically on one solr instance. and all of them
 need
 to be actively working with indexing and queries. So we are not having like
 10s of thousands of cores that only part of them need to be loaded.

 Our issues are on some servers that host very large customers, it runs out
 of disk space after some time due to the large among of index data. I have
 written a restful service that is being deployed with solr on tomcat to
 identify the large customer (core) indexing requests and consult with a dns
 service, it then off loads the indexing requests to additional solr
 servers,
 and support queries using solr shards on these servers going forward.

 We also have replicas for each shard, managed by our own software using
 peer
 model (I am thinking about using solr replications after 1.4).

 to me, SolrCould is like sharding+replication+zookeeper. I could be wrong.
 But if I am right, with very big existing data in our service, and we
 already have a lot of software in place working pretty well utilizing solr
 1.4, I am just trying to figure out if it will worth it to migrate the
 production system to use SolrCloud.

 The problem we need to fix is in one area : I need to automate the off-load
 (sharding) process. Right now we use some monitor system to watch for the
 growth on each server. When we find a fast growing large core(customer), we
 will start to manually configure our dns directory and start adding
 shard(s)
 to it (basically we create a same core name on a different solr
 server/instance). my restful service going forward will then direct the
 queries for the customer onto these sharded cores using solr shards.

 If SolrCloud can not really help me automate this process, it is not very
 attractive to me right now. I have read some of the topics, I looked into
 distributing indexing, distributed update processor ... none of them can
 help the way I have been looking for. So I guess using solrcloud or not, I
 will need to write my own kind of 'load balancer' for indexing, unless I am
 wrong.

 I did come across Jon's white paper on Loggly, I have designed a model
 based
 on what he has done. The solution should be able to automatically creating
 shards, but it will need rsych index files for a core to different server
 and use solr merge to merge small core into larger cores, or use core admin
 to add new core on the fly.

 is this approach sounds like someone is already familiar with and had
 out-of-box solution? When I looked into solrcloud, I was expecting some
 pluggable index distributing policy factory I can customize.
 The closest thing I found was  SOLR-2593 (A new core admin action 'split'
 for splitting index ) but not exactly what I wanted.  Let me know if you
 can
 advice me on this more.

 thanks
 Jie



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018609.html
 Sent from the Solr - User mailing list archive at Nabble.com.



Re: Add new shard will be treated as replicas in Solr4.0?

2012-11-06 Thread Zeng Lames
got it. thanks a lot


On Tue, Nov 6, 2012 at 8:43 PM, Erick Erickson erickerick...@gmail.comwrote:

 bq: where can i find all the items on the road map?

 Well, you really can't G... There's no official roadmap. I happen to
 know this since I follow the developer's list and I've seen references to
 this being important to the folks doing SolrCloud development work and it's
 been a recurring theme on the user's list. It's one of those things that
 _everybody_ understands would be useful in certain circumstances, but
 haven't had time to actually implement yet.

 You can track this at: https://issues.apache.org/jira/browse/SOLR-2592

 Best
 Erick



 On Mon, Nov 5, 2012 at 7:57 PM, Zeng Lames lezhi.z...@gmail.com wrote:

  btw, where can i find all the items in the road map? thanks!
 
 
  On Tue, Nov 6, 2012 at 8:55 AM, Zeng Lames lezhi.z...@gmail.com wrote:
 
   hi Erick, thanks for your kindly response. hv got the information from
  the
   SolrCloud wiki.
   think we may need to defined the shard numbers when we really rollout
 it.
  
   thanks again
  
  
   On Mon, Nov 5, 2012 at 8:40 PM, Erick Erickson 
 erickerick...@gmail.com
  wrote:
  
   Not at present. What you're interested in is shard splitting which
 is
   certainly on the roadmap but not implemented yet. To expand the
   number of shards you'll have to reconfigure, then re-index.
  
   Best
   Erick
  
  
   On Mon, Nov 5, 2012 at 4:09 AM, Zeng Lames lezhi.z...@gmail.com
  wrote:
  
Dear All,
   
we have an existing solr collection, 2 shards, numOfShard is 2. and
   there
are already records in the index files. now we start another solr
   instance
with ShardId= shard3, and found that Solr treat it as replicas.
   
check the zookeeper data, found the range of shard doesn't
change correspondingly. shard 1 is 0-7fff, while shard 2 is
8000-.
   
is there any way to increase new shard for existing collection?
   
thanks a lot!
Lames
   
  
  
  
 



Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread roz dev
Erick

We have a requirement where seach admin can add or remove some synonyms and
would want these changes to be reflected in search thereafter.

yes, we looked at reload command and it seems to be suitable for that
purpose. We have a master and slave setup so it should be OK to issue
reload command on master. I expect that slaves will pull the latest config
files.

Is reload operation very costly, in terms of time and cpu? We have a
multicore setup and would need to issue reload on multiple cores.

Thanks
Saroj


On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.comwrote:

 Not that I know of. This would be extremely expensive in the usual case.
 Loading up configs, reconfiguring all the handlers etc. would add a huge
 amount of overhead to the commit operation, which is heavy enough as it is.

 What's the use-case here? Changing your configs really often and reading
 them on commit sounds like a way to make for a very confusing application!

 But if you really need to re-read all this info on a running system,
 consider the core admin RELOAD command.

 Best
 Erick


 On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote:

  Hi All
 
  I am keen to find out if Solr exposes any event listener or other hooks
  which can be used to re-read configuration files.
 
 
  I know that we have firstSearcher event but I am not sure if it causes
  request handlers to reload themselves and read the conf files again.
 
  For example, if I change the synonym file and solr gets a commit, will it
  re-initialize request handlers and re-read the conf files.
 
  Or, are there some events which can be listened to?
 
  Any inputs are welcome.
 
  Thanks
  Saroj
 



Re: load balance with SolrCloud

2012-11-06 Thread Jie Sun
thanks for your feedback Erick.

I am also aware of the current limitation of shard number in a collection is
fixed. changing the number will need re-config and re-index. Let's say if
the limitation gets levitated in near future release, I would then consider
setup collection for each customer, which will include varies number of
shards and their replicas (depend on the customer size and it should grow
dynamically).

 so this will lead to having multiple collections on one solr server
instance... I assume setup n collections on one server is not an issue? or
is it? I am skeptical, see example on solr wiki below, it seems it is
starting a solr instance with a specific collection and its config:
cd example
java -Dbootstrap_confdir=./solr/collection1/conf
-Dcollection.configName=myconf -DzkRun -DnumShards=2 -jar start.jar

thanks
Jie



--
View this message in context: 
http://lucene.472066.n3.nabble.com/load-balance-with-SolrCloud-tp4018367p4018659.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread Otis Gospodnetic
Hi,

Note about modifying synonyms - you need to reindex, really, if using
index-time synonyms. And if you're using search-time synonyms you have
multi-word synonym issue described on the Wiki.

Otis
--
Performance Monitoring - http://sematext.com/spm
On Nov 6, 2012 11:02 PM, roz dev rozde...@gmail.com wrote:

 Erick

 We have a requirement where seach admin can add or remove some synonyms and
 would want these changes to be reflected in search thereafter.

 yes, we looked at reload command and it seems to be suitable for that
 purpose. We have a master and slave setup so it should be OK to issue
 reload command on master. I expect that slaves will pull the latest config
 files.

 Is reload operation very costly, in terms of time and cpu? We have a
 multicore setup and would need to issue reload on multiple cores.

 Thanks
 Saroj


 On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com
 wrote:

  Not that I know of. This would be extremely expensive in the usual case.
  Loading up configs, reconfiguring all the handlers etc. would add a huge
  amount of overhead to the commit operation, which is heavy enough as it
 is.
 
  What's the use-case here? Changing your configs really often and reading
  them on commit sounds like a way to make for a very confusing
 application!
 
  But if you really need to re-read all this info on a running system,
  consider the core admin RELOAD command.
 
  Best
  Erick
 
 
  On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote:
 
   Hi All
  
   I am keen to find out if Solr exposes any event listener or other hooks
   which can be used to re-read configuration files.
  
  
   I know that we have firstSearcher event but I am not sure if it causes
   request handlers to reload themselves and read the conf files again.
  
   For example, if I change the synonym file and solr gets a commit, will
 it
   re-initialize request handlers and re-read the conf files.
  
   Or, are there some events which can be listened to?
  
   Any inputs are welcome.
  
   Thanks
   Saroj
  
 



Two questions about solrcloud

2012-11-06 Thread SuoNayi
Hi all,sorry for questions about solrcloud from newbie.
here is my two questions:
1.If I have a solrcloud cluster with two shards and 0 replica on two different 
server.
 when one of server restarts will the solr instance on that server replay
 the transaction log to make sure these operations persistent to the index 
files(commit the transaction log)?
 
2.Assuming I have 3 shards cluster with 4 different server,
it will form a cluster with 3 shard and 1 replica. Can I remove one server to 
reduce
(degrade)the number of servers? if so does I just need to shutdown the server 
and manually remove it's node from ZK?
 
Regards

SuoNayi

Re: Solr Replication is not Possible on RAMDirectory?

2012-11-06 Thread deniz
Erik Hatcher-4 wrote
 There's an open issue (with a patch!) that enables this, it seems:
 lt;https://issues.apache.org/jira/browse/SOLR-3911gt;
 
   Erik

well patch seems not doing that... i have tried and still getting some error
lines about the dir types




-
Zeki ama calismiyor... Calissa yapar...
--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Replication-is-not-Possible-on-RAMDirectory-tp4017766p4018670.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: How to re-read the config files in Solr, on a commit

2012-11-06 Thread roz dev
Thanks Otis for pointing this out.

We may end up using search time synonyms for single word synonym and use
index time synonym for multi world synonyms.

-Saroj


On Tue, Nov 6, 2012 at 8:09 PM, Otis Gospodnetic otis.gospodne...@gmail.com
 wrote:

 Hi,

 Note about modifying synonyms - you need to reindex, really, if using
 index-time synonyms. And if you're using search-time synonyms you have
 multi-word synonym issue described on the Wiki.

 Otis
 --
 Performance Monitoring - http://sematext.com/spm
 On Nov 6, 2012 11:02 PM, roz dev rozde...@gmail.com wrote:

  Erick
 
  We have a requirement where seach admin can add or remove some synonyms
 and
  would want these changes to be reflected in search thereafter.
 
  yes, we looked at reload command and it seems to be suitable for that
  purpose. We have a master and slave setup so it should be OK to issue
  reload command on master. I expect that slaves will pull the latest
 config
  files.
 
  Is reload operation very costly, in terms of time and cpu? We have a
  multicore setup and would need to issue reload on multiple cores.
 
  Thanks
  Saroj
 
 
  On Tue, Nov 6, 2012 at 5:02 AM, Erick Erickson erickerick...@gmail.com
  wrote:
 
   Not that I know of. This would be extremely expensive in the usual
 case.
   Loading up configs, reconfiguring all the handlers etc. would add a
 huge
   amount of overhead to the commit operation, which is heavy enough as it
  is.
  
   What's the use-case here? Changing your configs really often and
 reading
   them on commit sounds like a way to make for a very confusing
  application!
  
   But if you really need to re-read all this info on a running system,
   consider the core admin RELOAD command.
  
   Best
   Erick
  
  
   On Mon, Nov 5, 2012 at 8:43 PM, roz dev rozde...@gmail.com wrote:
  
Hi All
   
I am keen to find out if Solr exposes any event listener or other
 hooks
which can be used to re-read configuration files.
   
   
I know that we have firstSearcher event but I am not sure if it
 causes
request handlers to reload themselves and read the conf files again.
   
For example, if I change the synonym file and solr gets a commit,
 will
  it
re-initialize request handlers and re-read the conf files.
   
Or, are there some events which can be listened to?
   
Any inputs are welcome.
   
Thanks
Saroj