Re: Update field properties via Schema Rest API ?

2013-09-30 Thread Upayavira
Updating a field isn't straightforward. Changing size from string to int
would, if you don't re-index, break your index. The schema tells Slr how
to interpret the binary bits it finds in the index. If there are no bits
in the index for that field name, then no issue. If there already are
bits in the index, changing the schema will cause Solr to get confused
when those bits are returned from the index in the results of a query.

It seems to me that you could get away with using dynamic fields. Start
with size_s, which is a string field. Then, start adding a size_i field
to your index. It is a new field containing the integer version. Because
of the dynamic fields definitions in the schema, these fields would not
require schema editing or core reloads.

If you, however, want every document in tier index to have the integer
size value, you will need to update those documents, adding that new
field(and skipping/removing the string one if no-longer needed).

Hope this helps.

Upayavira

On Sat, Sep 28, 2013, at 04:38 PM, bengates wrote:
 Haha,
 
 Thanks for your reply, that's what I'll do then.
 
 Unfortunately I can speak Java as well as I can speak ancient Chinese in
 Sign Language... ^^
 
 
 
 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Update-field-properties-via-Schema-Rest-API-tp4087907p4092507.html
 Sent from the Solr - User mailing list archive at Nabble.com.


Re: Hello and help :)

2013-09-30 Thread Upayavira
If your app and solr aren't far apart, you shouldn't be afraid of
multiple queries to solr per user request (I once discovered an app that
did 36 hits to solr per user request, and despite such awfulness of
design, no user ever complained about speed).

You could do a query to solr for q=+user_id:X +date:[dateX TO dateY] to
find out how many docs, then take the numFound value, if it is above Y,
do a subsequent query to retrieve the docs, either all docs, or toes in
the relevant date range.

Don't know if that helps.

Upayavira

On Sun, Sep 29, 2013, at 05:15 PM, Matheus Salvia wrote:
 Thanks for the anwser. Yes, you understood it correctly.
 The method you proposed should work perfectly, except I do have one more
 requirement that I forgot to mention earlier, and I apologize for that.
 The true problem we are facing is:
 * find all documents for userID=x, where userID=x has more than y
  documents in the index between dateA and dateB
 
 And since dateA and dateB can be any dates, its impossible to save the
 count, since we cannot foresee what date and what count will be
 requested.
 
 
 2013/9/28 Upayavira u...@odoko.co.uk
 
  To phrase your need more generically:
 
   * find all documents for userID=x, where userID=x has more than y
   documents in the index
 
  Is that correct?
 
  If it is, I'd probably do some work at index time. First guess, I'd keep
  a separate core, which has a very small document per user, storing just:
 
   * userID
   * docCount
 
  Then, when you add/delete a document, you use atomic updates to either
  increase or decrease the docCount on that user doc.
 
  Then you can use a pseudo join between these two cores relatively
  easily.
 
  q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x
  +doc_count:[y TO *]
 
  Worst case, if you don't want to mess with your indexing code, I wonder
  if you could use a ScriptUpdateProcessor to do this work - not sure if
  you can have one add an entirely new, additional, document to the list,
  but may be possible.
 
  Upayavira
 
  On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote:
   Sure, sorry for the inconvenience.
  
   I'm having a little trouble trying to make a query in Solr. The problem
   is:
   I must be able retrieve documents that have the same value for a
   specified
   field, but they should only be retrieved if this value appeared more than
   X
   times for a specified user. In pseudosql it would be something like:
  
   select user_id from documents
   where my_field=my_value
   and
   (select count(*) from documents where my_field=my_value and
   user_id=super.user_id)  X
  
   I Know that solr return a 'numFound' for each query you make, but I dont
   know how to retrieve this value in a subquery.
  
   My Solr is organized in a way that a user is a document, and the
   properties
   of the user (such as name, age, etc) are grouped in another document with
   a
   'root_id' field. So lets suppose the following query that gets all the
   root
   documents whose children have the prefix some_prefix.
  
   is_root:true AND _query_:{!join from=root_id
   to=id}requests_prefix:\some_prefix\
  
   Now, how can I get the root documents (users in some sense) that have
   more
   than X children matching 'requests_prefix:some_prefix' or any other
   condition? Is it possible?
  
   P.S. It must be done in a single query, fields can be added at will, but
   the root/children structure should be preserved (preferentially).
  
  
   2013/9/27 Upayavira u...@odoko.co.uk
  
Mattheus,
   
Given these mails form a part of an archive that are themselves
self-contained, can you please post your actual question here? You're
more likely to get answers that way.
   
Thanks, Upayavira
   
On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote:
 Hello everyone,
 I'm having a problem regarding how to make a solr query, I've posted
  it
 on
 stackoverflow.
 Can someone help me?

   
  http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter

 Thanks in advance!

 --
 --
  // Matheus Salvia
 Desenvolvedor Mobile
 Celular: +55 11 9-6446-2332
 Skype: meta.faraday
   
  
  
  
   --
   --
// Matheus Salvia
   Desenvolvedor Mobile
   Celular: +55 11 9-6446-2332
   Skype: meta.faraday
 
 
 
 
 -- 
 --
  // Matheus Salvia
 Desenvolvedor Mobile
 Celular: +55 11 9-6446-2332
 Skype: meta.faraday


Re: Maximum solr processes per machine

2013-09-30 Thread adfel70
Bram Van Dam wrote
 On 09/29/2013 04:03 PM, adfel70 wrote:
 If you're doing real time on a 5TB index then you'll probably want to 
 throw your money at the fastest storage you can afford (SSDs vs spinning 
 rust made a huge difference in our benchmarks) and the fastest CPUs you 
 can get your hands on. Memory is important too, but in our benchmarks 
 that didn't have as much impact as the other factors. Keeping a 5TB 
 index in memory is going to be tricky, so in my opinion you'd be better 
 off investing in faster disks instead.

Can you please elaborate on your benchmarks? what was the cluster size,
which hardware (CPUs, RAM size, disk type...) and so on?
This info might really help us. 
Also, if I understand you correctly, from certain index size, the impact of
RAM size is less important than disk performance?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Maximum-solr-processes-per-machine-tp4092568p4092651.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: autocomplete_edge type split words

2013-09-30 Thread elisabeth benoit
in fact, I've removed the autoGeneratePhraseQuery=true, and it doesn't
change anything. behaviour is the same with or without (ie request with
debugQuery=on is the same)

Thanks for your comments.

Best,
Elisabeth


2013/9/28 Erick Erickson erickerick...@gmail.com

 You've probably been doing this right along, but adding
 debug=query will show the parsed query.

 I really question though, your apparent combination of
 autoGeneratePhraseQuery what looks like an ngram field.
 I'm not at all sure how those would interact...

 Best,
 Erick

 On Fri, Sep 27, 2013 at 10:12 AM, elisabeth benoit
 elisaelisael...@gmail.com wrote:
  Yes!
 
  what I've done is set autoGeneratePhraseQueries to true for my field,
 then
  give it a boost (bq=myAutompleteEdgeNGramField=my query with
 spaces^50).
  This only worked with autoGeneratePhraseQueries=true, for a reason I
 didn't
  understand.
 
  since when I did
 
  q= myAutompleteEdgeNGramField=my query with spaces, I didn't need
  autoGeneratePhraseQueries
  set to true.
 
  and, another thing is when I tried
 
  q=myAutocompleteNGramField:(my query with spaces) OR
  myAutompleteEdgeNGramField=my
  query with spaces
 
  (with a request handler with edismax and default operator field = AND),
 the
  request on myAutocompleteNGramField would OR the grams, so I had to put
 an
  AND (myAutocompleteNGramField:(my AND query AND with AND spaces)), which
  was pretty ugly.
 
  I don't always understand what is exactly going on. If you have a pointer
  to some text I could read to get more insights about this, please let me
  know.
 
  Thanks again,
  Best regards,
  Elisabeth
 
 
 
 
  2013/9/27 Erick Erickson erickerick...@gmail.com
 
  Have you looked at autoGeneratePhraseQueries? That might help.
 
  If that doesn't work, you can always do something like add an OR clause
  like
  OR original query
  and optionally boost it high. But I'd start with the autoGenerate bits.
 
  Best,
  Erick
 
 
  On Fri, Sep 27, 2013 at 7:37 AM, elisabeth benoit
  elisaelisael...@gmail.com wrote:
   Thanks for your answer.
  
   So I guess if someone wants to search on two fields, on with phrase
 query
   and one with normal query (splitted in words), one has to find a
 way to
   send query twice: one with quote and one without...
  
   Best regards,
   Elisabeth
  
  
   2013/9/27 Erick Erickson erickerick...@gmail.com
  
   This is a classic issue where there's confusion between
   the query parser and field analysis.
  
   Early in the process the query parser has to take the input
   and break it up. that's how, for instance, a query like
   text:term1 term2
   gets parsed as
   text:term1 defaultfield:term2
   This happens long before the terms get to the analysis chain
   for the field.
  
   So your only options are to either quote the string or
   escape the spaces.
  
   Best,
   Erick
  
   On Wed, Sep 25, 2013 at 9:24 AM, elisabeth benoit
   elisaelisael...@gmail.com wrote:
Hello,
   
I am using solr 4.2.1 and I have a autocomplete_edge type defined
 in
schema.xml
   
   
fieldType name=autocomplete_edge class=solr.TextField
  analyzer type=index
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=\s+
replacement=  replace=all/
filter class=solr.EdgeNGramFilterFactory maxGramSize=30
minGramSize=1/
   /analyzer
  analyzer type=query
charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
filter class=solr.PatternReplaceFilterFactory pattern=\s+
replacement=  replace=all/
 filter class=solr.PatternReplaceFilterFactory
pattern=^(.{30})(.*)? replacement=$1 replace=all/
  /analyzer
/fieldType
   
When I have a request with more then one word, for instance rue de
  la,
   my
request doesn't match with my autocomplete_edge field unless I use
  quotes
around the query. In other words q=rue de la doesnt work and
 q=rue de
   la
works.
   
I've check the request with debugQuery=on, and I can see in first
  case,
   the
query is splitted into words, and I don't understand why since my
  field
type uses KeywordTokenizerFactory.
   
Does anyone have a clue on how I can request my field without using
   quotes?
   
Thanks,
Elisabeth
  
 



documents are not commited distributively in solr cloud tomcat with core discovery, range is null for shards in clusterstate.json

2013-09-30 Thread Liu Bo
Hi all

I'm trying out the tutorial about solrcloud, and then I manage to write my
own plugin to import data from our set of databases, I use SolrWriter from
DataImporter package and the docs could be distributed commit to shards.

Every thing works fine using jetty from the solr example, but when I move
to tomcat, solrcloud seems not been configured right. As the documents are
just committed to the shard where update requested goes to.

The cause probably is the range is null for shards in clusterstate.json.
The router is implicit instead of compositeId as well.

Is there anything missed or configured wrong in the following steps? How
could I fix it. Your help will be much of my appreciation.

PS, solr cloud tomcat wiki page isn't up to 4.4 with core discovery, I'm
trying out after reading SoclrCloud, SolrCloudJboss, and CoreAdmin wiki
pages.

Here's what I've done and some useful logs:

1. start three zookeeper server.
2. upload configuration files to zookeeper, the collection name is
content_collection
3. start three tomcat instants on three server with core discovery

a) core file:
 name=content
 loadOnStartup=true
 transient=false
 shard=shard1   (differrent on servers)
 collection=content_collection
b) solr.xml

 solr

  solrcloud

str name=host${host:}/str

str name=hostContext${hostContext:solr}/str

int name=hostPort8080/int

int name=zkClientTimeout${zkClientTimeout:15000}/int

str name=zkHost10.199.46.176:2181,10.199.46.165:2181,
10.199.46.158:2181/str

bool name=genericCoreNodeNames${genericCoreNodeNames:true}/bool

  /solrcloud


  shardHandlerFactory name=shardHandlerFactory

class=HttpShardHandlerFactory

int name=socketTimeout${socketTimeout:0}/int

int name=connTimeout${connTimeout:0}/int

  /shardHandlerFactory

/solr

4. In the solr.log, I see the three shards are recognized, and the
solrcloud can see the content_collection has three shards as well.
5. write documents to content_collection using my update request, the
documents only commits to the shard the request goes to, in the log I can
see the DistributedUpdateProcessorFactory is in the processorChain and
disribute commit is triggered:

INFO  - 2013-09-30 16:31:43.205;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
updata request processor factories:

INFO  - 2013-09-30 16:31:43.206;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
org.apache.solr.update.processor.LogUpdateProcessorFactory@4ae7b77

INFO  - 2013-09-30 16:31:43.207;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
org.apache.solr.update.processor.*DistributedUpdateProcessorFactory*
@5b2bc407

INFO  - 2013-09-30 16:31:43.207;
com.microstrategy.alert.search.solr.plugin.index.handler.IndexRequestHandler;
org.apache.solr.update.processor.RunUpdateProcessorFactory@1652d654

INFO  - 2013-09-30 16:31:43.283; org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onInit: commits: num=1


commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

INFO  - 2013-09-30 16:31:43.284; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 1

INFO  - 2013-09-30 16:31:43.440; *org.apache.solr.update.SolrCmdDistributor;
Distrib commit to*:[StdNode: http://10.199.46.176:8080/solr/content/,
StdNode: http://10.199.46.165:8080/solr/content/]
params:commit_end_point=truecommit=truesoftCommit=falsewaitSearcher=trueexpungeDeletes=false

but the documents won't go to other shards, the other shards only has a
request with not documents:

INFO  - 2013-09-30 16:31:43.841;
org.apache.solr.update.DirectUpdateHandler2; start
commit{,optimize=false,openSearcher=true,waitSearcher=true,expungeDeletes=false,softCommit=false,prepareCommit=false}

INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
SolrDeletionPolicy.onInit: commits: num=1

commit{dir=/home/bold/work/tomcat/solr/content/data/index,segFN=segments_1,generation=1}

INFO  - 2013-09-30 16:31:43.855; org.apache.solr.core.SolrDeletionPolicy;
newest commit generation = 1

INFO  - 2013-09-30 16:31:43.856;
org.apache.solr.update.DirectUpdateHandler2; No uncommitted changes.
Skipping IW.commit.

INFO  - 2013-09-30 16:31:43.865; org.apache.solr.search.SolrIndexSearcher;
Opening Searcher@3c74c144 main

INFO  - 2013-09-30 16:31:43.869; org.apache.solr.core.QuerySenderListener;
QuerySenderListener sending requests to
Searcher@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)}

INFO  - 2013-09-30 16:31:43.869; org.apache.solr.core.QuerySenderListener;
QuerySenderListener done.

INFO  - 2013-09-30 16:31:43.869; org.apache.solr.core.SolrCore; [content]
Registered new searcher
Searcher@3c74c144main{StandardDirectoryReader(segments_1:1:nrt)}

INFO  - 2013-09-30 16:31:43.870;
org.apache.solr.update.DirectUpdateHandler2; end_commit_flush

INFO  - 2013-09-30 

solr 4.4 config trouble

2013-09-30 Thread Marc des Garets

Hi,

I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I 
can't get it to work. If someone can point me at what I'm doing wrong.


tomcat context:
Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0 
crossContext=true
Environment name=solr/home type=java.lang.String 
value=/opt/solr4.4/solr_address override=true /

/Context


core.properties:
name=address
collection=address
coreNodeName=address
dataDir=/opt/indexes4.1/address


solr.xml:
?xml version=1.0 encoding=UTF-8 ?
solr
solrcloud
str name=host${host:}/str
int name=hostPort8080/int
str name=hostContextsolr_address/str
int name=zkClientTimeout${zkClientTimeout:15000}/int
bool name=genericCoreNodeNamesfalse/bool
/solrcloud

shardHandlerFactory name=shardHandlerFactory
class=HttpShardHandlerFactory
int name=socketTimeout${socketTimeout:0}/int
int name=connTimeout${connTimeout:0}/int
/shardHandlerFactory
/solr


In solrconfig.xml I have:
luceneMatchVersion4.1/luceneMatchVersion

dataDir/opt/indexes4.1/address/dataDir


And the log4j logs in catalina.out:
...
INFO: Deploying configuration descriptor solr_address.xml
0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – 
SolrDispatchFilter.init()
24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI 
solr.home: /opt/solr4.4/solr_address
26 [main] INFO org.apache.solr.core.SolrResourceLoader – new 
SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container 
configuration from /opt/solr4.4/solr_address/solr.xml
272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address
276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf
276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf/xslt
277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf/lang
278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for 
cores in /opt/solr4.4/solr_address/conf/velocity
283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 
991552899
284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into 
CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
301 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
socketTimeout to: 0
301 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
urlScheme to: http://
301 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
connTimeout to: 0
302 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
maxConnectionsPerHost to: 20
302 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
corePoolSize to: 0
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
maximumPoolSize to: 2147483647
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
maxThreadIdleTime to: 5
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
sizeOfQueue to: -1
303 [main] INFO 
org.apache.solr.handler.component.HttpShardHandlerFactory – Setting 
fairnessPolicy to: false
320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – 
Creating new http client, 
config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false
420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log 
Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper 
client=192.168.10.206:2181
429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – 
Creating new http client, 
config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0
487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting 
for client to connect to ZooKeeper
540 [main-EventThread] INFO 
org.apache.solr.common.cloud.ConnectionManager – Watcher 
org.apache.solr.common.cloud.ConnectionManager@7dc21ece 
name:ZooKeeperConnection Watcher:192.168.10.206:2181 got event 
WatchedEvent state:SyncConnected type:None path:null path:null type:None
541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client 
is connected to ZooKeeper
562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/overseer/queue
578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/overseer/collection-queue-work
591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/live_nodes
595 [main] INFO org.apache.solr.cloud.ZkController – Register node as 
live in ZooKeeper:/live_nodes/192.168.10.206:8080_solr_address
600 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/live_nodes/192.168.10.206:8080_solr_address
606 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
/collections
613 [main] INFO 

Solr takes too long to start up

2013-09-30 Thread Zenith
Hi all and thanks in advance for any help with this issue I am having...

Loading halts here: 

Sep 30, 2013 9:38:04 AM org.apache.solr.core.QuerySenderListener newSearcher
INFO: QuerySenderListener sending requests to Searcher@687de17d 
main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}

Once i flush the index and repopulate it loads up normally. I suspect somehow 
the index is getting corrupt.
I also get the following errors on startup (these are related to the tomcat 
admin page which I do not use and solr has run fine in the past with them):


NFO: QuerySenderListener sending requests to Searcher@252ac42e 
main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}
Sep 30, 2013 9:52:13 AM org.apache.coyote.AbstractProtocol init
INFO: Initializing ProtocolHandler [http-bio-8080]
Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 1018 ms
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardService startInternal
INFO: Starting service Catalina
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardEngine startInternal
INFO: Starting Servlet Engine: Apache Tomcat/7.0.30
Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor
INFO: Deploying configuration descriptor 
/etc/tomcat7/Catalina/localhost/host-manager.xml
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart
SEVERE: Error starting static Resources
java.lang.IllegalArgumentException: Document base 
/usr/share/tomcat7-admin/host-manager does not exist or is not a readable 
directory
at 
org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
at 
org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
at 
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error in resourceStart()
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error getConfigured
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Context [/host-manager] startup failed due to previous errors
Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig deployDescriptor
INFO: Deploying configuration descriptor 
/etc/tomcat7/Catalina/localhost/manager.xml
Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext resourcesStart
SEVERE: Error starting static Resources
java.lang.IllegalArgumentException: Document base 
/usr/share/tomcat7-admin/manager does not exist or is not a readable directory
at 
org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
at 
org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
at 
org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
at 
org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
at 
org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
at 
org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
at 
org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
at 
java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
at java.util.concurrent.FutureTask.run(FutureTask.java:166)
at 
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722)

Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
SEVERE: Error in 

cookies sent by solrj to SOLR

2013-09-30 Thread Dmitry Kan
Hello!

We have recorded the tcp stream between the client using solrj to send
requests to SOLR and arrived at the following header (body omitted):




POST /solr/core0/select HTTP/1.1

Content-Charset: UTF-8

Content-Type: application/x-www-form-urlencoded; charset=UTF-8

User-Agent: Solr[org.apache.solr.client.solrj.impl.HttpSolrServer] 1.0

Content-Length: 6163

Host: host:port

Connection: Keep-Alive

Cookie: visited=yes

Cookie2: $Version=1


Can someone please explain what effect do both cookies have on the frontend
solr that has shards underneath it? Solr 4.3.1.

Thanks,

Dmitry


Re: Solr takes too long to start up

2013-09-30 Thread Zenith
As a follow up looks like it is related to this thread:
http://lucene.472066.n3.nabble.com/spellcheck-causing-Core-Reload-to-hang-td4089866.html

Disabling spellcheck gave a normal restart. 



On Sep 30, 2013, at 12:54 PM, Zenith wrote:

 Hi all and thanks in advance for any help with this issue I am having...
 
 Loading halts here: 
 
 Sep 30, 2013 9:38:04 AM org.apache.solr.core.QuerySenderListener newSearcher
 INFO: QuerySenderListener sending requests to Searcher@687de17d 
 main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}
 
 Once i flush the index and repopulate it loads up normally. I suspect somehow 
 the index is getting corrupt.
 I also get the following errors on startup (these are related to the tomcat 
 admin page which I do not use and solr has run fine in the past with them):
 
 
 NFO: QuerySenderListener sending requests to Searcher@252ac42e 
 main{StandardDirectoryReader(segments_1k:268 _2r(4.2):C12590)}
 Sep 30, 2013 9:52:13 AM org.apache.coyote.AbstractProtocol init
 INFO: Initializing ProtocolHandler [http-bio-8080]
 Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.Catalina load
 INFO: Initialization processed in 1018 ms
 Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardService startInternal
 INFO: Starting service Catalina
 Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardEngine startInternal
 INFO: Starting Servlet Engine: Apache Tomcat/7.0.30
 Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig 
 deployDescriptor
 INFO: Deploying configuration descriptor 
 /etc/tomcat7/Catalina/localhost/host-manager.xml
 Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext 
 resourcesStart
 SEVERE: Error starting static Resources
 java.lang.IllegalArgumentException: Document base 
 /usr/share/tomcat7-admin/host-manager does not exist or is not a readable 
 directory
   at 
 org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
   at 
 org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
   at 
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
   at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
   at 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
   at 
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
   at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
   at 
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
   at 
 org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
   at 
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
   at java.lang.Thread.run(Thread.java:722)
 
 Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
 SEVERE: Error in resourceStart()
 Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
 SEVERE: Error getConfigured
 Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext startInternal
 SEVERE: Context [/host-manager] startup failed due to previous errors
 Sep 30, 2013 9:52:13 AM org.apache.catalina.startup.HostConfig 
 deployDescriptor
 INFO: Deploying configuration descriptor 
 /etc/tomcat7/Catalina/localhost/manager.xml
 Sep 30, 2013 9:52:13 AM org.apache.catalina.core.StandardContext 
 resourcesStart
 SEVERE: Error starting static Resources
 java.lang.IllegalArgumentException: Document base 
 /usr/share/tomcat7-admin/manager does not exist or is not a readable directory
   at 
 org.apache.naming.resources.FileDirContext.setDocBase(FileDirContext.java:140)
   at 
 org.apache.catalina.core.StandardContext.resourcesStart(StandardContext.java:4906)
   at 
 org.apache.catalina.core.StandardContext.startInternal(StandardContext.java:5086)
   at org.apache.catalina.util.LifecycleBase.start(LifecycleBase.java:150)
   at 
 org.apache.catalina.core.ContainerBase.addChildInternal(ContainerBase.java:901)
   at 
 org.apache.catalina.core.ContainerBase.addChild(ContainerBase.java:877)
   at org.apache.catalina.core.StandardHost.addChild(StandardHost.java:618)
   at 
 org.apache.catalina.startup.HostConfig.deployDescriptor(HostConfig.java:650)
   at 
 org.apache.catalina.startup.HostConfig$DeployDescriptor.run(HostConfig.java:1582)
   at 
 java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)
   at java.util.concurrent.FutureTask$Sync.innerRun(FutureTask.java:334)
   at java.util.concurrent.FutureTask.run(FutureTask.java:166)
   at 
 

OpenJDK or OracleJDK

2013-09-30 Thread Raheel Hasan
Hi guyz,

I am trying to setup a server.

Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr
over CentOS?

Thanks a lot.

-- 
Regards,
Raheel Hasan


Re: solr 4.4 config trouble

2013-09-30 Thread Siegfried Goeschl
Hi Marc,

what exactly is not working - no obvious problemsin the logs as as I see

Cheers,

Siegfried Goeschl

Am 30.09.2013 um 11:44 schrieb Marc des Garets m...@ttux.net:

 Hi,
 
 I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I can't 
 get it to work. If someone can point me at what I'm doing wrong.
 
 tomcat context:
 Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0 
 crossContext=true
 Environment name=solr/home type=java.lang.String 
 value=/opt/solr4.4/solr_address override=true /
 /Context
 
 
 core.properties:
 name=address
 collection=address
 coreNodeName=address
 dataDir=/opt/indexes4.1/address
 
 
 solr.xml:
 ?xml version=1.0 encoding=UTF-8 ?
 solr
 solrcloud
 str name=host${host:}/str
 int name=hostPort8080/int
 str name=hostContextsolr_address/str
 int name=zkClientTimeout${zkClientTimeout:15000}/int
 bool name=genericCoreNodeNamesfalse/bool
 /solrcloud
 
 shardHandlerFactory name=shardHandlerFactory
 class=HttpShardHandlerFactory
 int name=socketTimeout${socketTimeout:0}/int
 int name=connTimeout${connTimeout:0}/int
 /shardHandlerFactory
 /solr
 
 
 In solrconfig.xml I have:
 luceneMatchVersion4.1/luceneMatchVersion
 
 dataDir/opt/indexes4.1/address/dataDir
 
 
 And the log4j logs in catalina.out:
 ...
 INFO: Deploying configuration descriptor solr_address.xml
 0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter – 
 SolrDispatchFilter.init()
 24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI 
 solr.home: /opt/solr4.4/solr_address
 26 [main] INFO org.apache.solr.core.SolrResourceLoader – new 
 SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
 176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container 
 configuration from /opt/solr4.4/solr_address/solr.xml
 272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
 in /opt/solr4.4/solr_address
 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
 in /opt/solr4.4/solr_address/conf
 276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
 in /opt/solr4.4/solr_address/conf/xslt
 277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
 in /opt/solr4.4/solr_address/conf/lang
 278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for cores 
 in /opt/solr4.4/solr_address/conf/velocity
 283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer 
 991552899
 284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into 
 CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting socketTimeout to: 0
 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting urlScheme to: http://
 301 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting connTimeout to: 0
 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting maxConnectionsPerHost to: 20
 302 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting corePoolSize to: 0
 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting maximumPoolSize to: 2147483647
 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting maxThreadIdleTime to: 5
 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting sizeOfQueue to: -1
 303 [main] INFO org.apache.solr.handler.component.HttpShardHandlerFactory – 
 Setting fairnessPolicy to: false
 320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating 
 new http client, 
 config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false
 420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log Listener 
 [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
 422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper 
 client=192.168.10.206:2181
 429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil – Creating 
 new http client, 
 config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0
 487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting for 
 client to connect to ZooKeeper
 540 [main-EventThread] INFO org.apache.solr.common.cloud.ConnectionManager – 
 Watcher org.apache.solr.common.cloud.ConnectionManager@7dc21ece 
 name:ZooKeeperConnection Watcher:192.168.10.206:2181 got event WatchedEvent 
 state:SyncConnected type:None path:null path:null type:None
 541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client is 
 connected to ZooKeeper
 562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
 /overseer/queue
 578 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
 /overseer/collection-queue-work
 591 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath: 
 /live_nodes
 595 [main] INFO org.apache.solr.cloud.ZkController – Register node as live in 
 

Re: solr 4.4 config trouble

2013-09-30 Thread Kishan Parmar
http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Mon, Sep 30, 2013 at 5:33 AM, Siegfried Goeschl sgoes...@gmx.at wrote:

 Hi Marc,

 what exactly is not working - no obvious problemsin the logs as as I see

 Cheers,

 Siegfried Goeschl

 Am 30.09.2013 um 11:44 schrieb Marc des Garets m...@ttux.net:

  Hi,
 
  I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I
 can't get it to work. If someone can point me at what I'm doing wrong.
 
  tomcat context:
  Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0
 crossContext=true
  Environment name=solr/home type=java.lang.String
 value=/opt/solr4.4/solr_address override=true /
  /Context
 
 
  core.properties:
  name=address
  collection=address
  coreNodeName=address
  dataDir=/opt/indexes4.1/address
 
 
  solr.xml:
  ?xml version=1.0 encoding=UTF-8 ?
  solr
  solrcloud
  str name=host${host:}/str
  int name=hostPort8080/int
  str name=hostContextsolr_address/str
  int name=zkClientTimeout${zkClientTimeout:15000}/int
  bool name=genericCoreNodeNamesfalse/bool
  /solrcloud
 
  shardHandlerFactory name=shardHandlerFactory
  class=HttpShardHandlerFactory
  int name=socketTimeout${socketTimeout:0}/int
  int name=connTimeout${connTimeout:0}/int
  /shardHandlerFactory
  /solr
 
 
  In solrconfig.xml I have:
  luceneMatchVersion4.1/luceneMatchVersion
 
  dataDir/opt/indexes4.1/address/dataDir
 
 
  And the log4j logs in catalina.out:
  ...
  INFO: Deploying configuration descriptor solr_address.xml
  0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter –
 SolrDispatchFilter.init()
  24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI
 solr.home: /opt/solr4.4/solr_address
  26 [main] INFO org.apache.solr.core.SolrResourceLoader – new
 SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
  176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container
 configuration from /opt/solr4.4/solr_address/solr.xml
  272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address
  276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf
  276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf/xslt
  277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf/lang
  278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf/velocity
  283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer
 991552899
  284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into
 CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
  301 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 socketTimeout to: 0
  301 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 urlScheme to: http://
  301 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 connTimeout to: 0
  302 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 maxConnectionsPerHost to: 20
  302 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 corePoolSize to: 0
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 maximumPoolSize to: 2147483647
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 maxThreadIdleTime to: 5
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 sizeOfQueue to: -1
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 fairnessPolicy to: false
  320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
 Creating new http client,
 config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false
  420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log
 Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
  422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=
 192.168.10.206:2181
  429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
 Creating new http client,
 config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0
  487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting
 for client to connect to ZooKeeper
  540 [main-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager – Watcher
 org.apache.solr.common.cloud.ConnectionManager@7dc21ecename:ZooKeeperConnection
  Watcher:
 192.168.10.206:2181 got event WatchedEvent state:SyncConnected type:None
 path:null path:null type:None
  541 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Client
 is connected to ZooKeeper
  562 [main] INFO org.apache.solr.common.cloud.SolrZkClient – makePath:
 

AW: Re: solr 4.4 config trouble

2013-09-30 Thread sgoeschl
Not sure if you are doing your company a favour ;-)

Cheers

Siegfried Goeschl


Von Samsung Mobile gesendet

 Ursprüngliche Nachricht 
Von: Kishan Parmar kishan@gmail.com 
Datum:  
An: solr-user@lucene.apache.org 
Betreff: Re: solr 4.4 config trouble 
 
http://www.coretechnologies.com/products/AlwaysUp/Apps/RunApacheSolrAsAService.html

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



On Mon, Sep 30, 2013 at 5:33 AM, Siegfried Goeschl sgoes...@gmx.at wrote:

 Hi Marc,

 what exactly is not working - no obvious problemsin the logs as as I see

 Cheers,

 Siegfried Goeschl

 Am 30.09.2013 um 11:44 schrieb Marc des Garets m...@ttux.net:

  Hi,
 
  I'm running solr in tomcat. I am trying to upgrade to solr 4.4 but I
 can't get it to work. If someone can point me at what I'm doing wrong.
 
  tomcat context:
  Context docBase=/opt/solr4.4/dist/solr-4.4.0.war debug=0
 crossContext=true
  Environment name=solr/home type=java.lang.String
 value=/opt/solr4.4/solr_address override=true /
  /Context
 
 
  core.properties:
  name=address
  collection=address
  coreNodeName=address
  dataDir=/opt/indexes4.1/address
 
 
  solr.xml:
  ?xml version=1.0 encoding=UTF-8 ?
  solr
  solrcloud
  str name=host${host:}/str
  int name=hostPort8080/int
  str name=hostContextsolr_address/str
  int name=zkClientTimeout${zkClientTimeout:15000}/int
  bool name=genericCoreNodeNamesfalse/bool
  /solrcloud
 
  shardHandlerFactory name=shardHandlerFactory
  class=HttpShardHandlerFactory
  int name=socketTimeout${socketTimeout:0}/int
  int name=connTimeout${connTimeout:0}/int
  /shardHandlerFactory
  /solr
 
 
  In solrconfig.xml I have:
  luceneMatchVersion4.1/luceneMatchVersion
 
  dataDir/opt/indexes4.1/address/dataDir
 
 
  And the log4j logs in catalina.out:
  ...
  INFO: Deploying configuration descriptor solr_address.xml
  0 [main] INFO org.apache.solr.servlet.SolrDispatchFilter –
 SolrDispatchFilter.init()
  24 [main] INFO org.apache.solr.core.SolrResourceLoader – Using JNDI
 solr.home: /opt/solr4.4/solr_address
  26 [main] INFO org.apache.solr.core.SolrResourceLoader – new
 SolrResourceLoader for directory: '/opt/solr4.4/solr_address/'
  176 [main] INFO org.apache.solr.core.ConfigSolr – Loading container
 configuration from /opt/solr4.4/solr_address/solr.xml
  272 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address
  276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf
  276 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf/xslt
  277 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf/lang
  278 [main] INFO org.apache.solr.core.SolrCoreDiscoverer – Looking for
 cores in /opt/solr4.4/solr_address/conf/velocity
  283 [main] INFO org.apache.solr.core.CoreContainer – New CoreContainer
 991552899
  284 [main] INFO org.apache.solr.core.CoreContainer – Loading cores into
 CoreContainer [instanceDir=/opt/solr4.4/solr_address/]
  301 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 socketTimeout to: 0
  301 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 urlScheme to: http://
  301 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 connTimeout to: 0
  302 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 maxConnectionsPerHost to: 20
  302 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 corePoolSize to: 0
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 maximumPoolSize to: 2147483647
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 maxThreadIdleTime to: 5
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 sizeOfQueue to: -1
  303 [main] INFO
 org.apache.solr.handler.component.HttpShardHandlerFactory – Setting
 fairnessPolicy to: false
  320 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
 Creating new http client,
 config:maxConnectionsPerHost=20maxConnections=1socketTimeout=0connTimeout=0retry=false
  420 [main] INFO org.apache.solr.logging.LogWatcher – Registering Log
 Listener [Log4j (org.slf4j.impl.Log4jLoggerFactory)]
  422 [main] INFO org.apache.solr.core.ZkContainer – Zookeeper client=
 192.168.10.206:2181
  429 [main] INFO org.apache.solr.client.solrj.impl.HttpClientUtil –
 Creating new http client,
 config:maxConnections=500maxConnectionsPerHost=16socketTimeout=0connTimeout=0
  487 [main] INFO org.apache.solr.common.cloud.ConnectionManager – Waiting
 for client to connect to ZooKeeper
  540 [main-EventThread] INFO
 org.apache.solr.common.cloud.ConnectionManager – Watcher
 org.apache.solr.common.cloud.ConnectionManager@7dc21ecename:ZooKeeperConnection
  Watcher:
 

Re: OpenJDK or OracleJDK

2013-09-30 Thread Bram Van Dam

On 09/30/2013 01:11 PM, Raheel Hasan wrote:

Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr
over CentOS?


If you're using Java 7 (or 8) then it doesn't matter. If you're using 
Java 6, stick with the Oracle version.




Re: Solr sorting situation!

2013-09-30 Thread Gustav
Anyone with any ideas?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-sorting-situation-tp4091966p4092688.html
Sent from the Solr - User mailing list archive at Nabble.com.


filterCache stats reported wrongly in solr admin?

2013-09-30 Thread Dmitry Kan
Hi!

Can it really be so that filterCache size is 63, inserts 103 and zero
evictions?

Is this a bug or am I misinterpreting the stats?

http://pasteboard.co/9Dmkc4H.png

Thanks,

Dmitry


Re: Atomic updates with solr cloud in solr 4.4

2013-09-30 Thread Sesha Sendhil Subramanian
The field variant_count is stored and is not the target of a copyfield.
field name=variant_count type=int indexed=true stored=true
required=true multiValued=false/

However I did notice that we were setting the same coreNodeName on both the
shards in core.properties. Removing this property fixed the issue and
updates succeed.

What role does this play in handling updates and why were other queries
using the select handler not failing?

Thanks
Sesha


On Sat, Sep 21, 2013 at 7:59 PM, Yonik Seeley yo...@lucidworks.com wrote:

 I can't reproduce this.
 I tried starting up a 2 shard cluster and then followed the example here:
 http://yonik.com/solr/atomic-updates/

 book1 was on shard2 (port 7574) and everything still worked fine.

  missing required field: variant_count

 Perhaps the problem is document specific... What can you say about
 this variant_count field?
 Is it stored?  Is it the target of a copyField?


 -Yonik
 http://lucidworks.com




 On Tue, Sep 17, 2013 at 12:56 PM, Sesha Sendhil Subramanian
 seshasend...@indix.com wrote:
  curl http://localhost:8983/solr/search/update -H
  'Content-type:application/json' -d '
  [
   {
id:
  c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675,
link_id_45454 : {set:abcdegff}
   }
  ]'
 
  I have two collections search and meta. I want to do an update in the
  search collection.
  If i pick a document in same shard : localhost:8983, the update succeeds
 
  15350327 [qtp386373885-19] INFO
   org.apache.solr.update.processor.LogUpdateProcessor  ? [search]
  webapp=/solr path=/update params={}
  {add=[6cfcb56ca52b56ccb1377a7f0842e74d!6cfcb56ca52b56ccb1377a7f0842e74d
  (1446444025873694720)]} 0 5
 
  If i pick a document on a different shard : localhost:7574, the update
 fails
 
  15438547 [qtp386373885-75] INFO
   org.apache.solr.update.processor.LogUpdateProcessor  ? [search]
  webapp=/solr path=/update params={} {} 0 1
  15438548 [qtp386373885-75] ERROR org.apache.solr.core.SolrCore  ?
  org.apache.solr.common.SolrException:
  [doc=c8cce27c1d8129d733a3df3de68dd675!c8cce27c1d8129d733a3df3de68dd675]
  missing required field: variant_count
  at
 
 org.apache.solr.update.DocumentBuilder.toDocument(DocumentBuilder.java:189)
  at
 
 org.apache.solr.update.AddUpdateCommand.getLuceneDocument(AddUpdateCommand.java:73)
  at
 
 org.apache.solr.update.DirectUpdateHandler2.addDoc(DirectUpdateHandler2.java:210)
  at
 
 org.apache.solr.update.processor.RunUpdateProcessor.processAdd(RunUpdateProcessorFactory.java:69)
  at
 
 org.apache.solr.update.processor.UpdateRequestProcessor.processAdd(UpdateRequestProcessor.java:51)
  at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.doLocalAdd(DistributedUpdateProcessor.java:556)
  at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.versionAdd(DistributedUpdateProcessor.java:692)
  at
 
 org.apache.solr.update.processor.DistributedUpdateProcessor.processAdd(DistributedUpdateProcessor.java:435)
  at
 
 org.apache.solr.update.processor.LogUpdateProcessor.processAdd(LogUpdateProcessorFactory.java:100)
  at
 
 org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.handleAdds(JsonLoader.java:392)
  at
 
 org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.processUpdate(JsonLoader.java:117)
  at
 
 org.apache.solr.handler.loader.JsonLoader$SingleThreadedJsonLoader.load(JsonLoader.java:101)
  at org.apache.solr.handler.loader.JsonLoader.load(JsonLoader.java:65)
  at
 
 org.apache.solr.handler.UpdateRequestHandler$1.load(UpdateRequestHandler.java:92)
  at
 
 org.apache.solr.handler.ContentStreamHandlerBase.handleRequestBody(ContentStreamHandlerBase.java:74)
  at
 
 org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:135)
  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1904)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:659)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:362)
  at
 
 org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:158)
  at
 
 org.eclipse.jetty.servlet.ServletHandler$CachedChain.doFilter(ServletHandler.java:1419)
  at
 
 org.eclipse.jetty.servlet.ServletHandler.doHandle(ServletHandler.java:455)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:137)
  at
 
 org.eclipse.jetty.security.SecurityHandler.handle(SecurityHandler.java:557)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doHandle(SessionHandler.java:231)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doHandle(ContextHandler.java:1075)
  at
 org.eclipse.jetty.servlet.ServletHandler.doScope(ServletHandler.java:384)
  at
 
 org.eclipse.jetty.server.session.SessionHandler.doScope(SessionHandler.java:193)
  at
 
 org.eclipse.jetty.server.handler.ContextHandler.doScope(ContextHandler.java:1009)
  at
 
 org.eclipse.jetty.server.handler.ScopedHandler.handle(ScopedHandler.java:135)
  at
 
 

Re: filterCache stats reported wrongly in solr admin?

2013-09-30 Thread Dmitry Kan
Looking at the code reveals that the put  = insert operation increases the
counter regardless of the duplicates. The size returns unique values only.

Thanks to ehatcher for the hint.

Dmitry
On 30 Sep 2013 16:23, Dmitry Kan solrexp...@gmail.com wrote:

 Hi!

 Can it really be so that filterCache size is 63, inserts 103 and zero
 evictions?

 Is this a bug or am I misinterpreting the stats?

 http://pasteboard.co/9Dmkc4H.png

 Thanks,

 Dmitry



Re: Solr Autocomplete with did you means functionality handle misspell word like google

2013-09-30 Thread Alessandro Benedetti
It's really simple indeed. Solr provide the SpellCheck[1] feature that
allow you to do this.
You have only to configure the RequestHandler and the Search Component.
And of course develop a simple ui ( you can find an example in the velocity
response handler Solritas[2] .

Cheers

[1] https://cwiki.apache.org/confluence/display/solr/Spell+Checking
[2] https://cwiki.apache.org/confluence/display/solr/Velocity+Search+UI


2013/9/27 Otis Gospodnetic otis.gospodne...@gmail.com

 Hi,

 Not sure if Solr suggester can do this (can it, anyone?), but...
 shameless plug... I know
 http://sematext.com/products/autocomplete/index.html can do that.

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



 On Thu, Sep 26, 2013 at 8:26 AM, Suneel Pandey pandey.sun...@gmail.com
 wrote:
  http://lucene.472066.n3.nabble.com/file/n4092127/autocomplete.png
 
  Hi,
 
  I have implemented auto complete it's  working file but, I want to
 implement
  autosuggestion like google (see above screen) . when someone typing
 misspell
  words suggestion should be show e.g: cmputer = computer.
 
 
  Please help me.
 
 
 
 
 
 
  -
  Regards,
 
  Suneel Pandey
  Sr. Software Developer
  --
  View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-Autocomplete-with-did-you-means-functionality-handle-misspell-word-like-google-tp4092127.html
  Sent from the Solr - User mailing list archive at Nabble.com.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: Solr and jvm Garbage Collection tuning

2013-09-30 Thread Alessandro Benedetti
I think this could help : http://wiki.apache.org/solr/ShawnHeisey#GC_Tuning

Cheers


2013/9/27 ewinclub7 ewincl...@hotmail.com


 ด้วยที่แทงบอลแบบออนไลน์กำลังมาแรงทำให้พวกโต๊ะบอลเดี๋ยวนี้ก็เริ่มขยับขยายมาเปิดรับแทงบอลออนไลน์เอง
 download goldclub http://www.goldclub.net/download/
 เป้าหมายหลักในวิธีการเล่นคาสิโนนั้น มีเพื่อความเพลิดเพลินหรือความสนุก
 ไม่ใช่เพื่อมาหาเงินหรือหวังที่จะรวย
 เพราะนั้นเหมือนกับการที่เราเอาจิตใจของตนเองไปผูกติดกับวิธีการเล่นพนัน
 โปรโมชั่น goldclub slot http://www.goldclub.net/promotion/   เล่นสนุก
 เล่นง่าย พร้อมบริการอย่างเป็นกันเอง กับทีมงาน  ผลบอลเมื่อคืนนี้
 http://www.mixscore.com/result-score/

 จากที่เราได้เห็นวิธีการเล่นการพนันที่เล่นกันง่ายนั่นก็เลยทำให้คนเรานั่นเกิดความคิดที่อยากจะลองเล่นการพนันลองดู
 สาเหตุที่ทำให้นักเล่นหน้าใหม่ได้หัดเล่นเกมส์ซะเป็นส่วนใหญ่  goldclub slot
 http://www.goldclub-slot.com/

 เพราะแน่นอนว่าจากที่เคยไปเที่ยวประเทศไหนที่มีคาสิโนและเข้าไปลองเล่นดูก็คงจะได้สัมผัสถึงความคึกคักของคาสิโนนั้นๆ
 ถอนออกมาทั้ง 1200 บาทเลยก็ได้ หรือจะถอนมาแค่ 1000 บาท อีก 200 บาทเก็บไว้
 เล่นอีก แบบนี้ก็ได้เล่นกัน



 --
 View this message in context:
 http://lucene.472066.n3.nabble.com/Solr-and-jvm-Garbage-Collection-tuning-tp1455467p4092328.html
 Sent from the Solr - User mailing list archive at Nabble.com.




-- 
--

Benedetti Alessandro
Visiting card : http://about.me/alessandro_benedetti

Tyger, tyger burning bright
In the forests of the night,
What immortal hand or eye
Could frame thy fearful symmetry?

William Blake - Songs of Experience -1794 England


Re: OpenJDK or OracleJDK

2013-09-30 Thread Raheel Hasan
hmm why is that so?
Isnt Oracle's version a bit slow?


On Mon, Sep 30, 2013 at 5:56 PM, Bram Van Dam bram.van...@intix.eu wrote:

 On 09/30/2013 01:11 PM, Raheel Hasan wrote:

 Could someone tell me if OpenJDK or OracleJDK will be best for Apache Solr
 over CentOS?


 If you're using Java 7 (or 8) then it doesn't matter. If you're using Java
 6, stick with the Oracle version.




-- 
Regards,
Raheel Hasan


Re: XPathEntityProcessor nested in TikaEntityProcessor query null exception

2013-09-30 Thread P Williams
Hi Andreas,

When using 
XPathEntityProcessorhttp://wiki.apache.org/solr/DataImportHandler#XPathEntityProcessoryour
DataSource
must be of type DataSourceReader.  You shouldn't be using
BinURLDataSource, it's giving you the cast exception.  Use
URLDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/URLDataSource.html
or
FileDataSourcehttps://builds.apache.org/job/Solr-Artifacts-4.x/javadoc/solr-dataimporthandler/org/apache/solr/handler/dataimport/FileDataSource.htmlinstead.

I don't think you need to specify namespaces, at least you didn't used to.
 The other thing that I've noticed is that the anywhere xpath expression //
doesn't always work in DIH.  You might have to be more specific.

Cheers,
Tricia





On Sun, Sep 29, 2013 at 9:47 AM, Andreas Owen a...@conx.ch wrote:

 how dum can you get. obviously quite dum... i would have to analyze the
 html-pages with a nested instance like this:

 entity name=rec processor=XPathEntityProcessor
 url=file:///C:\ColdFusion10\cfusion\solr\solr\tkbintranet\docImportUrl.xml
 forEach=/docs/doc dataSource=main

 entity name=htm processor=XPathEntityProcessor
 url=${rec.urlParse} forEach=/xhtml:html dataSource=dataUrl
 field column=text xpath=//content /
 field column=h_2 xpath=//body /
 field column=text_nohtml xpath=//text /
 field column=h_1 xpath=//h:h1 /
 /entity
 /entity

 but i'm pretty sure the foreach is wrong and the xpath expressions. in the
 moment i getting the following error:

 Caused by: java.lang.RuntimeException:
 org.apache.solr.handler.dataimport.DataImportHandlerException:
 java.lang.ClassCastException:
 sun.net.www.protocol.http.HttpURLConnection$HttpInputStream cannot be cast
 to java.io.Reader





 On 28. Sep 2013, at 1:39 AM, Andreas Owen wrote:

  ok i see what your getting at but why doesn't the following work:
 
field xpath=//h:h1 column=h_1 /
field column=text xpath=/xhtml:html/xhtml:body /
 
  i removed the tiki-processor. what am i missing, i haven't found
 anything in the wiki?
 
 
  On 28. Sep 2013, at 12:28 AM, P Williams wrote:
 
  I spent some more time thinking about this.  Do you really need to use
 the
  TikaEntityProcessor?  It doesn't offer anything new to the document you
 are
  building that couldn't be accomplished by the XPathEntityProcessor alone
  from what I can tell.
 
  I also tried to get the Advanced
  Parsinghttp://wiki.apache.org/solr/TikaEntityProcessorexample to
  work without success.  There are some obvious typos (document
  instead of /document) and an odd order to the pieces (dataSources is
  enclosed by document).  It also looks like
  FieldStreamDataSource
 http://lucene.apache.org/solr/4_3_1/solr-dataimporthandler/org/apache/solr/handler/dataimport/FieldStreamDataSource.html
 is
  the one that is meant to work in this context. If Koji is still around
  maybe he could offer some help?  Otherwise this bit of erroneous
  instruction should probably be removed from the wiki.
 
  Cheers,
  Tricia
 
  $ svn diff
  Index:
 
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
  ===
  ---
 
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
 (revision 1526990)
  +++
 
 solr/contrib/dataimporthandler-extras/src/test/org/apache/solr/handler/dataimport/TestTikaEntityProcessor.java
 (working copy)
  @@ -99,13 +99,13 @@
 runFullImport(getConfigHTML(identity));
 assertQ(req(*:*), testsHTMLIdentity);
   }
  -
  +
   private String getConfigHTML(String htmlMapper) {
 return
 dataConfig +
   dataSource type='BinFileDataSource'/ +
   document +
  -entity name='Tika' format='xml'
  processor='TikaEntityProcessor'  +
  +entity name='Tika' format='html'
  processor='TikaEntityProcessor'  +
url=' +
  getFile(dihextras/structured.html).getAbsolutePath() + '  +
 ((htmlMapper == null) ?  : ( htmlMapper=' + htmlMapper +
  ')) +  +
   field column='text'/ +
  @@ -114,4 +114,36 @@
 /dataConfig;
 
   }
  +  private String[] testsHTMLH1 = {
  +  //*[@numFound='1']
  +  , //str[@name='h1'][contains(.,'H1 Header')]
  +  };
  +
  +  @Test
  +  public void testTikaHTMLMapperSubEntity() throws Exception {
  +runFullImport(getConfigSubEntity(identity));
  +assertQ(req(*:*), testsHTMLH1);
  +  }
  +
  +  private String getConfigSubEntity(String htmlMapper) {
  +return
  +dataConfig +
  +dataSource type='BinFileDataSource' name='bin'/ +
  +dataSource type='FieldStreamDataSource' name='fld'/ +
  +document +
  +entity name='tika' 

Re: Cross index join query performance

2013-09-30 Thread Peter Keegan
Ah, got it now - thanks for the explanation.


On Sat, Sep 28, 2013 at 3:33 AM, Upayavira u...@odoko.co.uk wrote:

 The thing here is to understand how a join works.

 Effectively, it does the inner query first, which results in a list of
 terms. It then effectively does a multi-term query with those values.

 q=size:large {!join fromIndex=other from=someid
 to=someotherid}type:shirt

 Imagine the inner join returned values A,B,C. Your inner query is, on
 core 'other', q=type:shirtfl=someid.

 Then your outer query becomes size:large someotherid:(A B C)

 Your inner query returns 25k values. You're having to do a multi-term
 query for 25k terms. That is *bound* to be slow.

 The pseudo-joins in Solr 4.x are intended for a small to medium number
 of values returned by the inner query, otherwise performance degrades as
 you are seeing.

 Is there a way you can reduce the number of values returned by the inner
 query?

 As Joel mentions, those other joins are attempts to find other ways to
 work with this limitation.

 Upayavira

 On Fri, Sep 27, 2013, at 09:44 PM, Peter Keegan wrote:
  Hi Joel,
 
  I tried this patch and it is quite a bit faster. Using the same query on
  a
  larger index (500K docs), the 'join' QTime was 1500 msec, and the 'hjoin'
  QTime was 100 msec! This was for true for large and small result sets.
 
  A few notes: the patch didn't compile with 4.3 because of the
  SolrCore.getLatestSchema call (which I worked around), and the package
  name
  should be:
  queryParser name=hjoin
  class=org.apache.solr.search.joins.HashSetJoinQParserPlugin/
 
  Unfortunately, I just learned that our uniqueKey may have to be an
  alphanumeric string instead of an int, so I'm not out of the woods yet.
 
  Good stuff - thanks.
 
  Peter
 
 
  On Thu, Sep 26, 2013 at 6:49 PM, Joel Bernstein joels...@gmail.com
  wrote:
 
   It looks like you are using int join keys so you may want to check out
   SOLR-4787, specifically the hjoin and bjoin.
  
   These perform well when you have a large number of results from the
   fromIndex. If you have a small number of results in the fromIndex the
   standard join will be faster.
  
  
   On Wed, Sep 25, 2013 at 3:39 PM, Peter Keegan peterlkee...@gmail.com
   wrote:
  
I forgot to mention - this is Solr 4.3
   
Peter
   
   
   
On Wed, Sep 25, 2013 at 3:38 PM, Peter Keegan 
 peterlkee...@gmail.com
wrote:
   
 I'm doing a cross-core join query and the join query is 30X slower
 than
 each of the 2 individual queries. Here are the queries:

 Main query:
 http://localhost:8983/solr/mainindex/select?q=title:java
 QTime: 5 msec
 hit count: 1000

 Sub query: http://localhost:8983/solr/subindex/select?q=+fld1:[0.1TO
0.3]
 QTime: 4 msec
 hit count: 25K

 Join query:

   
  
 http://localhost:8983/solr/mainindex/select?q=title:javafq={!joinfromIndex=mainindextoIndex=subindexfrom=docidto=docid}fld1:[0.1
  TO 0.3]
 QTime: 160 msec
 hit count: 205

 Here are the index spec's:

 mainindex size: 117K docs, 1 segment
 mainindex schema:
field name=docid type=int indexed=true stored=true
 required=true multiValued=false /
field name=title type=text_en_splitting indexed=true
 stored=true multiValued=false /
uniqueKeydocid/uniqueKey

 subindex size: 117K docs, 1 segment
 subindex schema:
field name=docid type=int indexed=true stored=true
 required=true multiValued=false /
field name=fld1 type=float indexed=true stored=true
 required=false multiValued=false /
uniqueKeydocid/uniqueKey

 With debugQuery=true I see:
   debug:{
 join:{
   {!join from=docid to=docid fromIndex=subindex}fld1:[0.1 TO
   0.3]:{
 time:155,
 fromSetSize:24742,
 toSetSize:24742,
 fromTermCount:117810,
 fromTermTotalDf:117810,
 fromTermDirectCount:117810,
 fromTermHits:24742,
 fromTermHitsTotalDf:24742,
 toTermHits:24742,
 toTermHitsTotalDf:24742,
 toTermDirectCount:24627,
 smallSetsDeferred:115,
 toSetDocsAdded:24742}},

 Via profiler and debugger, I see 150 msec spent in the outer
 'while(term!=null)' loop in: JoinQueryWeight.getDocSet(). This
 seems
like a
 lot of time to join the bitsets. Does this seem right?

 Peter


   
  
  
  
   --
   Joel Bernstein
   Professional Services LucidWorks
  



Considerations about setting maxMergedSegmentMB

2013-09-30 Thread Isaac Hebsh
Hi,
Trying to solve query performance issue, we suspect on the number of index
segments, which might slow the query (due to I/O seeks, happens for each
term in the query, multiplied by number of segments).
We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4).

We can reduce the number of segments by enlarging maxMergedSegmentMB, from
the default 5GB to something bigger (10GB, 15GB?).

What are the side effects, which should be considered when doing it?
Did anyone changed this setting in PROD for a while?


Searching on (hyphenated/capitalized) word issue

2013-09-30 Thread Van Tassell, Kristian
I have a search term multi-CAD being issues on tokenized text.  The problem 
is that you cannot get any search results when you type multicad unless you 
add a hyphen (multi-cad) or type multiCAD (omitting the hyphen, but correctly 
adding the CAPS into the spelling).



However, for the similar but unhyphenated word AutoCAD, you can type autocad 
and get hits for AutoCAD, as you would expect. You can type auto-cad and get 
the same results.

The query seems to get parsed as separate words (resulting in hits) for 
multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad. In 
other words, the search terms  become multi cad and auto cad for all cases 
except for when the term is multicad.

I'm guessing this may be in part to auto being a more common word prefix, but 
I may be wrong. Can anyone provide some clarity (and maybe point me towards a 
potential solution)?

Thanks in advance!


Kristian Van Tassell
Siemens Industry Sector
Siemens Product Lifecycle Management Software Inc.
5939 Rice Creek Parkway
Shoreview, MN  55126 United States
Tel.  :+1 (651) 855-6194
Fax  :+1 (651) 855-6280
kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20
www.siemens.com/plm



Re: Data duplication using Cloud+HDFS+Mirroring

2013-09-30 Thread Isaac Hebsh
Hi Greg, Did you get an answer?
I'm interested in the same question.

More generally, what are the benefits of HdfsDirectoryFactory, besides the
transparent restore of the shard contents in case of a disk failure, and
the ability to rebuild index using MR?
Is the next statement exact? blocks of a particular shard, which are
replicated to another node, will be never queried, since there is no solr
core configured to read them.


On Wed, Aug 7, 2013 at 8:46 PM, Greg Walters
gwalt...@sherpaanalytics.comwrote:

 While testing Solr's new ability to store data and transaction directories
 in HDFS I added an additional core to one of my testing servers that was
 configured as a backup (active but not leader) core for a shard elsewhere.
 It looks like this extra core copies the data into its own directory rather
 than just using the existing directory with the data that's already
 available to it.

 Since HDFS likely already has redundancy of the data covered via the
 replicationFactor is there a reason for non-leader cores to create their
 own data directory rather than doing reads on the existing master copy? I
 searched Jira for anything that suggests this behavior might change and
 didn't find any issues; is there any intent to address this?

 Thanks,
 Greg



Re: Doing time sensitive search in solr

2013-09-30 Thread Darniz
Thanks for the quick answers.
i have gone thru the presentation and thats what i was tilting towards using
dynamic fields i just want to run down an example so thats its clear about
how to approach this issue. 
entry start-date=1-sept-2013
Sept content : Honda is releasing the car this month 
entry
entry start-date=1-dec-2013
Dec content : Toyota is releasing the car this month 
entry
After adding dynamic fields like *_entryDate and *_entryText my solr doc
will look something like this.

date name=2013-09-01T00:00:00Z_entryDate2013-09-01T00:00:00Z/date
str name=2013-09-01T0:00:00Z_entryTextSept content : Honda is releasing
the car this month /str

date name=2013-12-01T00:00:00Z_entryDate2013-12-01T00:00:00Z/date
str name=2013-12-01T00:00:00Z_entryTextDec content : Toyota is releasing
the car this month /str

if someone searches for a query something like
*_entryDate:[* TO NOW] AND *_entryText:Toyota the results wont show up
toyota in the search results.

the only disadvantage we have with this approach is we might end up with a
lot of runtime fields since we have thousands of entries which might be time
bound in our cms. 
i might also do some more investigation to see if we can handle this at
index time to index data as time comes some scheduler of something, because
the above approach might solve the issue but may make the queries very slow.


Thanks



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092763.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Searching on (hyphenated/capitalized) word issue

2013-09-30 Thread Upayavira
You need to look at your analysis chain. The stuff you're talking about
there is all configurable.

There's different tokenisers available to split your fields differently,
then you might use the WordDelimiterFilterFactory to split existing
tokens further (e.g. WiFi might become wi, fi and WiFi). So
really, you need to craft your own analysis chain to fit the kind of
data you are working with.

Upayavira

On Mon, Sep 30, 2013, at 06:50 PM, Van Tassell, Kristian wrote:
 I have a search term multi-CAD being issues on tokenized text.  The
 problem is that you cannot get any search results when you type
 multicad unless you add a hyphen (multi-cad) or type multiCAD
 (omitting the hyphen, but correctly adding the CAPS into the spelling).
 
 
 
 However, for the similar but unhyphenated word AutoCAD, you can type
 autocad and get hits for AutoCAD, as you would expect. You can type
 auto-cad and get the same results.
 
 The query seems to get parsed as separate words (resulting in hits) for
 multi-CAD, multiCAD, autocad, auto-cad and AUTOCAD, but not for multicad.
 In other words, the search terms  become multi cad and auto cad for
 all cases except for when the term is multicad.
 
 I'm guessing this may be in part to auto being a more common word
 prefix, but I may be wrong. Can anyone provide some clarity (and maybe
 point me towards a potential solution)?
 
 Thanks in advance!
 
 
 Kristian Van Tassell
 Siemens Industry Sector
 Siemens Product Lifecycle Management Software Inc.
 5939 Rice Creek Parkway
 Shoreview, MN  55126 United States
 Tel.  :+1 (651) 855-6194
 Fax  :+1 (651) 855-6280
 kristian.vantass...@siemens.com kristian.vantass...@siemens.com%20
 www.siemens.com/plm
 


Issue with Group By / Field Collapsing

2013-09-30 Thread Shamik Bandopadhyay
Hi,

  I'm trying to use group by option to remove duplicates from my search
result. I'm applying Group By option on a field called TopicId. I'm simply
appending this at the end of my query.

group=truegroup.field=TopicId

Initially, the result looked great as I was able to see the duplicates
getting removed and only the document with highest score among the
duplicates,is being returned. But then when I started comparing the result
without the group by option, something doesn't look right. For e.g. the
search without a group by option returned results from Source A,  B and
C. Documents from Source A have the TopicId field while it's not
present in B or C. When I add the Group-By option, the documents from
B and C are completely ignored, though some of them have scores higher
than A.

I'm little confused if this is the intended behavior.  Does group-by mean
that it'll only return results where the group-by field is present ? Do I
need to use additional group-by parameters to address this ?

Any pointers will be highly appreciated.

Thanks,
Shamik


Re: Hello and help :)

2013-09-30 Thread Marcelo Elias Del Valle
Upayavira,

First of all, thanks for the answers.

We have considerer the possibily of doing several queries, however in
hour case we want a count to show to the user (should take less than 2
seconds) and we could have millions of rows (being million of queries) to
get this count.
Isn't there any way to filter by the count? Something like, get all
users where the number of corresponding documents in a join is lesser than
X.  Or all the users grouped by field F where count of records for field F
is lesser than X... Or anything like that, regarding counts...

Best regards,
Marcelo Valle.


2013/9/30 Upayavira u...@odoko.co.uk

 If your app and solr aren't far apart, you shouldn't be afraid of
 multiple queries to solr per user request (I once discovered an app that
 did 36 hits to solr per user request, and despite such awfulness of
 design, no user ever complained about speed).

 You could do a query to solr for q=+user_id:X +date:[dateX TO dateY] to
 find out how many docs, then take the numFound value, if it is above Y,
 do a subsequent query to retrieve the docs, either all docs, or toes in
 the relevant date range.

 Don't know if that helps.

 Upayavira

 On Sun, Sep 29, 2013, at 05:15 PM, Matheus Salvia wrote:
  Thanks for the anwser. Yes, you understood it correctly.
  The method you proposed should work perfectly, except I do have one more
  requirement that I forgot to mention earlier, and I apologize for that.
  The true problem we are facing is:
  * find all documents for userID=x, where userID=x has more than y
   documents in the index between dateA and dateB
 
  And since dateA and dateB can be any dates, its impossible to save the
  count, since we cannot foresee what date and what count will be
  requested.
 
 
  2013/9/28 Upayavira u...@odoko.co.uk
 
   To phrase your need more generically:
  
* find all documents for userID=x, where userID=x has more than y
documents in the index
  
   Is that correct?
  
   If it is, I'd probably do some work at index time. First guess, I'd
 keep
   a separate core, which has a very small document per user, storing
 just:
  
* userID
* docCount
  
   Then, when you add/delete a document, you use atomic updates to either
   increase or decrease the docCount on that user doc.
  
   Then you can use a pseudo join between these two cores relatively
   easily.
  
   q=user_id:x {!join fromIndex=user from=user_id to=user_id}+user_id:x
   +doc_count:[y TO *]
  
   Worst case, if you don't want to mess with your indexing code, I wonder
   if you could use a ScriptUpdateProcessor to do this work - not sure if
   you can have one add an entirely new, additional, document to the list,
   but may be possible.
  
   Upayavira
  
   On Fri, Sep 27, 2013, at 09:50 PM, Matheus Salvia wrote:
Sure, sorry for the inconvenience.
   
I'm having a little trouble trying to make a query in Solr. The
 problem
is:
I must be able retrieve documents that have the same value for a
specified
field, but they should only be retrieved if this value appeared more
 than
X
times for a specified user. In pseudosql it would be something like:
   
select user_id from documents
where my_field=my_value
and
(select count(*) from documents where my_field=my_value and
user_id=super.user_id)  X
   
I Know that solr return a 'numFound' for each query you make, but I
 dont
know how to retrieve this value in a subquery.
   
My Solr is organized in a way that a user is a document, and the
properties
of the user (such as name, age, etc) are grouped in another document
 with
a
'root_id' field. So lets suppose the following query that gets all
 the
root
documents whose children have the prefix some_prefix.
   
is_root:true AND _query_:{!join from=root_id
to=id}requests_prefix:\some_prefix\
   
Now, how can I get the root documents (users in some sense) that have
more
than X children matching 'requests_prefix:some_prefix' or any other
condition? Is it possible?
   
P.S. It must be done in a single query, fields can be added at will,
 but
the root/children structure should be preserved (preferentially).
   
   
2013/9/27 Upayavira u...@odoko.co.uk
   
 Mattheus,

 Given these mails form a part of an archive that are themselves
 self-contained, can you please post your actual question here?
 You're
 more likely to get answers that way.

 Thanks, Upayavira

 On Fri, Sep 27, 2013, at 04:36 PM, Matheus Salvia wrote:
  Hello everyone,
  I'm having a problem regarding how to make a solr query, I've
 posted
   it
  on
  stackoverflow.
  Can someone help me?
 

  
 http://stackoverflow.com/questions/19039099/apache-solr-count-of-subquery-as-a-superquery-parameter
 
  Thanks in advance!
 
  --
  --
   // Matheus Salvia
  Desenvolvedor Mobile
  Celular: +55 11 9-6446-2332
  

Re: Hello and help :)

2013-09-30 Thread Marcelo Elias Del Valle
Socratees,

You wrote: Or, What if you can facet by the field, and group by the field
count, then *apply facet filtering to exclude all filters with count less
than 5?*
That's exactly what I want, I just couldn't figure how to do it! Any idea
how could I write this query?

Best regards,
Marcelo.


2013/9/27 Socratees Samipillai ss...@outlook.com

 Hi Marcelo,
 I haven't faced this exact situation before so I can only try posting my
 thoughts.
 Since Solr allows Result Grouping and Faceting at the same time, and since
 you can apply filters on these facets, can you take advantage of that?
 Or, What if you can facet by the field, and group by the field count, then
 apply facet filtering to exclude all filters with count less than 5?
 These links might be helpful.
 http://architects.dzone.com/articles/facet-over-same-field-multiple
 https://issues.apache.org/jira/browse/SOLR-2898
 Thanks,
 — Socratees.

  Date: Fri, 27 Sep 2013 20:32:22 -0300
  Subject: Re: Hello and help :)
  From: marc...@s1mbi0se.com.br
  To: solr-user@lucene.apache.org
 
  Ssami,
 
  I work with Matheus and I am helping him to take a look at this
  problem. We took a look at result grouping, thinking it could help us,
 but
  it has two drawbacks:
 
 - We cannot have multivalued fields, if I understood it correctly. But
 ok, we could manage that...
 - Suppose some query like that:
- select count(*) NUMBER group by FIELD where CONDITION AND NUMBER
  5
- In this case, we are not just taking the count for each group as
 a
result. The count actually makes part of the where clause.
- AFAIK, result grouping doesn't allow that, although I would
 really
love to be proven wrong :D
 
  We really need this, so I am trying to figure what could I change in
  solr to make this work... Any hint on that? We would need to write a
 custom
  facet / search handler / search component ? Of course we prefer a
 solution
  that works with current solr features, but we could consider writing some
  custom code to do that
 
  Thanks in advance!
 
  Best regards,
  Marcelo Valle.
 
 
  2013/9/27 ssami ss...@outlook.com
 
   If I understand your question right, Result Grouping in Solr might help
   you.
  
   Refer  here
   https://cwiki.apache.org/confluence/display/solr/Result+Grouping  .
  
  
  
  
  
   --
   View this message in context:
  
 http://lucene.472066.n3.nabble.com/Hello-and-help-tp4092371p4092439.html
   Sent from the Solr - User mailing list archive at Nabble.com.
  



multi core join and simple indexed join

2013-09-30 Thread Marcelo Elias Del Valle
Comparing indexed joins on multiple core or on the same core...
Which one would be faster?
I am guessing doing it on multiple cores would be faster, as the index on
each core would be smaller... Any thoughts on that?
[]s


[JOB] Solr / Elasticsearch Engineer @ Sematext

2013-09-30 Thread Otis Gospodnetic
Hello,

If you are looking to work with Solr and Elasticsearch, among other
things, this may be for you:

http://blog.sematext.com/2013/09/26/solr-elasticsearch-job-engineering/

This role offers a healthy mix of Solr/ES consulting, support, and
product development.

Everything that might be of interest should be there, but I'll be
happy to answer any questions anyone may have off-list.

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm


Re: Considerations about setting maxMergedSegmentMB

2013-09-30 Thread Erick Erickson
Before going there, you can do a really simple test. Turn off
indexing and then issues a optimize/force-merge. After it
completes (and it may take quite some time) measure your
performance again to see fi this is on the right track.

Best,
Erick

On Mon, Sep 30, 2013 at 1:31 PM, Isaac Hebsh isaac.he...@gmail.com wrote:
 Hi,
 Trying to solve query performance issue, we suspect on the number of index
 segments, which might slow the query (due to I/O seeks, happens for each
 term in the query, multiplied by number of segments).
 We are on Solr 4.3 (TieredMergePolicy with mergeFactor of 4).

 We can reduce the number of segments by enlarging maxMergedSegmentMB, from
 the default 5GB to something bigger (10GB, 15GB?).

 What are the side effects, which should be considered when doing it?
 Did anyone changed this setting in PROD for a while?


No longer allowed to store html in a 'string' type

2013-09-30 Thread Kevin Cunningham
We have been using Solr for a while now, went from 1.4 - 3.6.  While running 
some tests in 4.4 we are no longer allowed to store raw html in a documents 
field with a type of 'string', which we used to be able to do. Has something 
changed here?  Now we get the following error: Undeclared general entity 
\nbsp\\r\n at [row,col {unknown-source}]: [11,53]

I understand what its saying and can change the way we store and extract it if 
it's a must but would like to understand what changed.  Sounds like something 
just became more strict to adhering to rules.

doc
str name=rawcontent
pTesting a 
href=/sample_group/b/sample_weblog/archive/tags/bananas/default.aspx 
class=tag hash-tag data-tags=bananas#bananas/anbsp;tag/p p/p 
pdocument document document document document document/pdiv 
style=clear:both;/div
/str
str name=typeblog/str
/doc




Re: Doing time sensitive search in solr

2013-09-30 Thread Darniz
Hello 
i just wanted to make sure can we query dynamic fields using wildcard well
if not then i dont think this solution might work, since i dont know the
exact concrete name of the field.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Doing-time-sensitive-search-in-solr-tp4092273p4092830.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: [JOB] Solr / Elasticsearch Engineer @ Sematext

2013-09-30 Thread Ashwin Tandel
Hi,

I would like to apply for SEARCH CONSULTING  SEARCH SOLUTIONS ARCHITECT
position.

PFA my resume. You can reach me at 2019934403.

Thanks,
Ashwin
cell - 2019934403


On Mon, Sep 30, 2013 at 4:17 PM, Otis Gospodnetic 
otis.gospodne...@gmail.com wrote:

 Hello,

 If you are looking to work with Solr and Elasticsearch, among other
 things, this may be for you:

 http://blog.sematext.com/2013/09/26/solr-elasticsearch-job-engineering/

 This role offers a healthy mix of Solr/ES consulting, support, and
 product development.

 Everything that might be of interest should be there, but I'll be
 happy to answer any questions anyone may have off-list.

 Otis
 --
 Solr  ElasticSearch Support -- http://sematext.com/
 Performance Monitoring -- http://sematext.com/spm



AshwinTandel.docx
Description: application/vnd.openxmlformats-officedocument.wordprocessingml.document


Re: OpenJDK or OracleJDK

2013-09-30 Thread Shawn Heisey
On 9/30/2013 9:28 AM, Raheel Hasan wrote:
 hmm why is that so?
 Isnt Oracle's version a bit slow?

For Java 6, the Sun JDK is the reference implementation.  For Java 7,
OpenJDK is the reference implementation.

http://en.wikipedia.org/wiki/Reference_implementation

I don't think Oracle's version could really be called slow.  Sun
invented Java.  Sun open sourced Java.  Oracle bought Sun.

The Oracle implemetation is likely more conservative than some of the
other implementations, like the one by IBM.  The IBM implementation is
pretty aggressive with optimization, so aggressive that Solr and Lucene
have a history of revealing bugs that only exist in that implementation.

Thanks,
Shawn



Re: OpenJDK or OracleJDK

2013-09-30 Thread Otis Gospodnetic
Hi,

A while back I remember we notices some SPM users were having issues
with OpenJDK.  Since then we've been recommending Oracle's
implementation to our Solr and to SPM users.  At the same time, we
haven't seen any issues with OpenJDK in the last ~6 months.  Oracle
JDK is not slow. :)

Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Sep 30, 2013 at 11:02 PM, Shawn Heisey s...@elyograg.org wrote:
 On 9/30/2013 9:28 AM, Raheel Hasan wrote:
 hmm why is that so?
 Isnt Oracle's version a bit slow?

 For Java 6, the Sun JDK is the reference implementation.  For Java 7,
 OpenJDK is the reference implementation.

 http://en.wikipedia.org/wiki/Reference_implementation

 I don't think Oracle's version could really be called slow.  Sun
 invented Java.  Sun open sourced Java.  Oracle bought Sun.

 The Oracle implemetation is likely more conservative than some of the
 other implementations, like the one by IBM.  The IBM implementation is
 pretty aggressive with optimization, so aggressive that Solr and Lucene
 have a history of revealing bugs that only exist in that implementation.

 Thanks,
 Shawn



Re: Percolate feature?

2013-09-30 Thread Otis Gospodnetic
Just came across this ancient thread.  Charlie, did this end up
happening?  I suspect Wolfgang may be interested, but that's just a
wild guess.

I was curious about your feeling that what you were open-sourcing
might be a lot faster and more flexible than ES's percolator - can you
share more about why do you have that feeling and whether you've
confirmed this?

Thanks,
Otis
--
Solr  ElasticSearch Support -- http://sematext.com/
Performance Monitoring -- http://sematext.com/spm



On Mon, Aug 5, 2013 at 6:34 AM, Charlie Hull char...@flax.co.uk wrote:
 On 03/08/2013 00:50, Mark wrote:

 We have a set number of known terms we want to match against.

 In Index:
 term one
 term two
 term three

 I know how to match all terms of a user query against the index but we
 would like to know how/if we can match a user's query against all the terms
 in the index?

 Search Queries:
 my search term = 0 matches
 my term search one = 1 match  (term one)
 some prefix term two = 1 match (term two)
 one two three = 0 matches

 I can only explain this is almost a reverse search???

 I came across the following from ElasticSearch
 (http://www.elasticsearch.org/guide/reference/api/percolate/) and it sounds
 like this may accomplish the above but haven't tested. I was wondering if
 Solr had something similar or an alternative way of accomplishing this?

 Thanks


 Hi Mark,

 We've built something that implements this kind of reverse search for our
 clients in the media monitoring sector - we're working on releasing the core
 of this as open source very soon, hopefully in a month or two. It's based on
 Lucene.

 Just for reference it's able to apply tens of thousands of stored queries to
 a document per second (our clients often have very large and complex Boolean
 strings representing their clients' interests and may monitor hundreds of
 thousands of news stories every day). It also records the positions of every
 match. We suspect it's a lot faster and more flexible than Elasticsearch's
 Percolate feature.

 Cheers

 Charlie

 --
 Charlie Hull
 Flax - Open Source Enterprise Search

 tel/fax: +44 (0)8700 118334
 mobile:  +44 (0)7767 825828
 web: www.flax.co.uk


Problem regarding queries enclosed in double quotes in Solr 3.4

2013-09-30 Thread Kunal Mittal
We have a Solr 3.4 setup. When we try to do queries with double quotes like :
semantic web , the query takes a long time to execute.
One solution we are thinking about is to make the same query without the
quotes and set the phrase slop(ps) parameter to 0. That is quite quicker
than the query with the quotes and gives similar results to the query with
quotes.
Is there a way to fix this by modifying the schema.xml file? Any suggestions
would be appreciated.




--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-regarding-queries-enclosed-in-double-quotes-in-Solr-3-4-tp4092856.html
Sent from the Solr - User mailing list archive at Nabble.com.


Problem regarding queries with numbers with a decimal point

2013-09-30 Thread Kunal Mittal
We have a Solr 3.4 setup. When we try to do queries with a decimal point like
: web 2.0 , the query takes a long time to execute.
One fix we did was to set generateNumberParts=0 in the
solr.WordDelimiterFilterFactory

This reduced the query time greatly but we want to reduce it further. Is
there a way to fix this by modifying the schema.xml file? Any suggestions
would be appreciated.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Problem-regarding-queries-with-numbers-with-a-decimal-point-tp4092857.html
Sent from the Solr - User mailing list archive at Nabble.com.