Re: Addreplica throwing error when authentication is enabled

2020-09-01 Thread Ben
It appears the issue is with the encrypted file. Are these files encrypted?
If yes, you need to decrypt it first.

moreCaused by: javax.crypto.BadPaddingException: RSA private key operation
failed

Best,
Ben

On Tue, Sep 1, 2020, 10:51 PM yaswanth kumar  wrote:

> Can some one please help me on the below error??
>
> Solr 8.2; zookeeper 3.4
>
> Enabled authentication and authentication and make sure that the role gets
> all access
>
> Now just add a collection with single replica and once done .. now try to
> add another replica with addreplica solr api and that’s throwing error ..
> note: this is happening only when security.json was enabled with
> authentication
>
> Below is the error
> Collection: test operation: restore
> failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create
> replicaCollection: test operation: restore
> failed:org.apache.solr.common.SolrException: ADDREPLICA failed to create
> replica at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1030)
> at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler$ShardRequestTracker.processResponses(OverseerCollectionMessageHandler.java:1013)
> at
> org.apache.solr.cloud.api.collections.AddReplicaCmd.lambda$addReplica$1(AddReplicaCmd.java:177)
> at
> org.apache.solr.cloud.api.collections.AddReplicaCmd$$Lambda$798/.run(Unknown
> Source) at
> org.apache.solr.cloud.api.collections.AddReplicaCmd.addReplica(AddReplicaCmd.java:199)
> at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.addReplica(OverseerCollectionMessageHandler.java:708)
> at
> org.apache.solr.cloud.api.collections.RestoreCmd.call(RestoreCmd.java:286)
> at
> org.apache.solr.cloud.api.collections.OverseerCollectionMessageHandler.processMessage(OverseerCollectionMessageHandler.java:264)
> at
> org.apache.solr.cloud.OverseerTaskProcessor$Runner.run(OverseerTaskProcessor.java:505)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor.lambda$execute$0(ExecutorUtil.java:209)
> at
> org.apache.solr.common.util.ExecutorUtil$MDCAwareThreadPoolExecutor$$Lambda$142/.run(Unknown
> Source) at
> java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
> at
> java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
> at java.base/java.lang.Thread.run(Thread.java:834)Caused by:
> org.apache.solr.common.SolrException: javax.crypto.BadPaddingException: RSA
> private key operation failed at
> org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:325) at
> org.apache.solr.security.PKIAuthenticationPlugin.generateToken(PKIAuthenticationPlugin.java:305)
> at
> org.apache.solr.security.PKIAuthenticationPlugin.access$200(PKIAuthenticationPlugin.java:61)
> at
> org.apache.solr.security.PKIAuthenticationPlugin$2.onQueued(PKIAuthenticationPlugin.java:239)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.decorateRequest(Http2SolrClient.java:468)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.makeRequest(Http2SolrClient.java:455)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:364)
> at
> org.apache.solr.client.solrj.impl.Http2SolrClient.request(Http2SolrClient.java:746)
> at org.apache.solr.client.solrj.SolrClient.request(SolrClient.java:1274) at
> org.apache.solr.handler.component.HttpShardHandler.request(HttpShardHandler.java:238)
> at
> org.apache.solr.handler.component.HttpShardHandler.lambda$submit$0(HttpShardHandler.java:199)
> at
> org.apache.solr.handler.component.HttpShardHandler$$Lambda$512/.call(Unknown
> Source) at
> java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at
> java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
> at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) at
> com.codahale.metrics.InstrumentedExecutorService$InstrumentedRunnable.run(InstrumentedExecutorService.java:181)
> ... 5 moreCaused by: javax.crypto.BadPaddingException: RSA private key
> operation failed at
> java.base/sun.security.rsa.NativeRSACore.crtCrypt_Native(NativeRSACore.java:149)
> at java.base/sun.security.rsa.NativeRSACore.rsa(NativeRSACore.java:91) at
> java.base/sun.security.rsa.RSACore.rsa(RSACore.java:149) at
> java.base/com.sun.crypto.provider.RSACipher.doFinal(RSACipher.java:355) at
> java.base/com.sun.crypto.provider.RSACipher.engineDoFinal(RSACipher.java:392)
> at java.base/javax.crypto.Cipher.doFinal(Cipher.java:2260) at
> org.apache.solr.util.CryptoKeys$RSAKeyPair.encrypt(CryptoKeys.java:323) ...
> 20 more
>
> That's the error stack trace I

Re: Solr Down Issue

2020-08-09 Thread Ben
Can you send solr logs?

Best,
Ben

On Sun, Aug 9, 2020, 9:55 AM Rashmi Jain  wrote:

> Hello Team,
>
> I am Rashmi jain implemented solr on one of our site
> bookswagon.com<https://www.bookswagon.com/>. last 2-3 month we are facing
> strange issue, solr down suddenly without interrupting.   We check solr
> login and also check application logs but no clue found there regarding
> this.
> We have implemented solr 7.4 on Java SE 10 and have index
> data of books around 28 million.
> Also we are running solr on Windows server 2012 standard
> with 32 RAM.
> Please help us on this.
>
> Regards,
> Rashmi
>
>
>


No Client EndPointIdentificationAlgorithm configured for SslContextFactory

2020-07-21 Thread Ben
Hello Everyone,

I just downloaded Sitecore 9.3.0 and installed Solr using the JSON file
that Sitecore provided. The installation was seamless and Solr was working
as expected. But when I checked the logs I am getting this warning .I am
attaching solr logs as well for your reference.

o.e.j.u.s.S.config No Client EndPointIdentificationAlgorithm configured for
SslContextFactory@1a2e2935
[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]

This appears to be the issue on 8.0, 8.1 and even 8.2 solr versions. Can
you please confirm? As a workaround I have updated the entry in the
jetty-ssl.xml file ( steps below). Is there a fix or a patch to fix this
issue?

Stop Solr Service

Go to Path - D:\Solr\\server\etc\jetty-ssl.xml

Open jetty-ssl.xml file

Add below entry to the  element:
null


Hope to hear back from you soon.

Best,
Ben
2020-07-21 13:21:02.786 INFO  (main) [   ] o.e.j.u.log Logging initialized 
@4907ms to org.eclipse.jetty.util.log.Slf4jLog
2020-07-21 13:21:03.004 WARN  (main) [   ] o.e.j.s.AbstractConnector Ignoring 
deprecated socket close linger time
2020-07-21 13:21:03.004 INFO  (main) [   ] o.e.j.s.Server 
jetty-9.4.14.v20181114; built: 2018-11-14T21:20:31.478Z; git: 
c4550056e785fb5665914545889f21dc136ad9e6; jvm 1.8.0_222-b10
2020-07-21 13:21:03.036 INFO  (main) [   ] o.e.j.d.p.ScanningAppProvider 
Deployment monitor [file:///D:/Solr/solr-8.1.1/server/contexts/] at interval 0
2020-07-21 13:21:03.473 INFO  (main) [   ] o.e.j.w.StandardDescriptorProcessor 
NO JSP Support for /solr, did not find org.apache.jasper.servlet.JspServlet
2020-07-21 13:21:03.473 INFO  (main) [   ] o.e.j.s.session 
DefaultSessionIdManager workerName=node0
2020-07-21 13:21:03.473 INFO  (main) [   ] o.e.j.s.session No SessionScavenger 
set, using defaults
2020-07-21 13:21:03.489 INFO  (main) [   ] o.e.j.s.session node0 Scavenging 
every 60ms
2020-07-21 13:21:03.536 INFO  (main) [   ] o.a.s.u.c.SSLConfigurations Setting 
javax.net.ssl.keyStorePassword
2020-07-21 13:21:03.536 INFO  (main) [   ] o.a.s.u.c.SSLConfigurations Setting 
javax.net.ssl.trustStorePassword
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter Using 
logger factory org.apache.logging.slf4j.Log4jLoggerFactory
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter  ___  
_   Welcome to Apache Solr™ version 8.1.1
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter / __| 
___| |_ _   Starting in standalone mode on port 8983
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter \__ \/ _ 
\ | '_|  Install dir: D:\Solr\solr-8.1.1
2020-07-21 13:21:03.567 INFO  (main) [   ] o.a.s.s.SolrDispatchFilter 
|___/\___/_|_|Start time: 2020-07-21T13:21:03.567Z
2020-07-21 13:21:03.583 INFO  (main) [   ] o.a.s.c.SolrResourceLoader Using 
system property solr.solr.home: D:\Solr\solr-8.1.1\server\solr
2020-07-21 13:21:03.598 INFO  (main) [   ] o.a.s.c.SolrXmlConfig Loading 
container configuration from D:\Solr\solr-8.1.1\server\solr\solr.xml
2020-07-21 13:21:03.692 INFO  (main) [   ] o.a.s.c.SolrXmlConfig MBean server 
found: com.sun.jmx.mbeanserver.JmxMBeanServer@1677d1, but no JMX reporters were 
configured - adding default JMX reporter.
2020-07-21 13:21:03.973 INFO  (main) [   ] o.a.s.h.c.HttpShardHandlerFactory 
Host whitelist initialized: WhitelistHostChecker [whitelistHosts=null, 
whitelistHostCheckingEnabled=true]
2020-07-21 13:21:04.004 WARN  (main) [   ] o.a.s.c.s.i.Http2SolrClient Create 
Http2SolrClient with HTTP/1.1 transport since Java 8 or lower versions does not 
support SSL + HTTP/2
2020-07-21 13:21:04.083 INFO  (main) [   ] o.e.j.u.s.SslContextFactory 
x509=X509@15dcfae7(solr_ssl_trust_store,h=[companydomain],w=[companydomain]) 
for 
SslContextFactory@3da05287[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]
2020-07-21 13:21:04.129 WARN  (main) [   ] o.e.j.u.s.S.config No Client 
EndPointIdentificationAlgorithm configured for 
SslContextFactory@3da05287[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]
2020-07-21 13:21:04.239 WARN  (main) [   ] o.a.s.c.s.i.Http2SolrClient Create 
Http2SolrClient with HTTP/1.1 transport since Java 8 or lower versions does not 
support SSL + HTTP/2
2020-07-21 13:21:04.254 INFO  (main) [   ] o.e.j.u.s.SslContextFactory 
x509=X509@bff34c6(solr_ssl_trust_store,h=[companydomain],w=[companydomain]) for 
SslContextFactory@1522d8a0[provider=null,keyStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks,trustStore=file:///D:/Solr/solr-8.1.1/server/etc/solr-ssl_keystore.jks]
2020-07-21 13:21:04.254 WARN  (main) [   ] o.e.j.u.s.S.config No Client 
EndPointIdentificationAlgorithm configured for 
SslContextFactory@1522d8a0[provider=null

SolrClient.ping() in 8.2, using SolrJ

2019-08-25 Thread Ben Friedman
Before I submit a new bug, I should ask you folks if this is my error.

I started a local SolrCloud instance with two nodes and two replicas per
node.  I created one empty collection on each node.

I tried to use the ping method in Solrj to verify my connected client.
When I try to use it, it throws ...

Caused by: org.apache.solr.common.SolrException: No collection param
specified on request and no default collection has been set: []
at
org.apache.solr.client.solrj.impl.BaseCloudSolrClient.sendRequest(BaseCloudSolrClient.java:1071)
~[solr-solrj-8.2.0.jar:8.2.0 31d7ec7bbfdcd2c4cc61d9d35e962165410b65fe -
ivera - 2019-07-19 15:11:07]

I cannot pass a collection name to the ping request.  And the
CloudSolrClient.Builder does not allow me to declare a default collection.

I'm not sure why a collection would be required for a ping.  And I'm not
sure why it does not automatically use the only collection created.

Have any suggestions for me?  Thank you.


Re: Inconsistent leader between ZK and Solr and a lot of downtime

2018-10-23 Thread Ben Knüttgen
Daniel Carrasco wrote
> Hello,
> 
> I'm investigating an 8 nodes Solr 7.2.1 cluster because we've a lot of
> problems, like when a node fails to import from a DB (maybe it freeze),
> the
> entire cluster goes down, and other like the leader wont change even when
> is down (all nodes detects that is down but no leader election is
> triggered), and similar problems. Every few days we've to recover the
> cluster because becomes inestable and goes down.
> 
> The last problem that I've got, is three collections that have nodes on
> "recovery" state from a lot of hours, and the log shows an error telling
> that "leader node is not the leader" so I'm trying to change the leader.

Make sure that the clocks on your servers are in sync. Otherwise inter node
authentication tokens could time out which could lead to the problems you
described. You should find hints to the cause of the communication problem
in your Solr logs.



--
Sent from: http://lucene.472066.n3.nabble.com/Solr-User-f472068.html


Re: Sort Facet Values by "Interestingness"?

2016-08-03 Thread Ben Heuwing

Hi Joel,

thank you, this sounds great!

As to your first proposal: I am a bit out of my depth here, as I have 
not worked with streaming expressions so far. But I will try out your 
example using the facet() expression on a simple use case as soon as you 
publish it.


Using the TermsComponent directly, would that imply that I have to 
retrieve all possible candidates and then sent them back as a  
terms.list to get their df? However, I assume that this would still be 
faster than having 2 repeated facet-calls. Or did you suggest to use the 
component in a customized RequestHandler?


Regards,

Ben

Am 03.08.2016 um 14:57 schrieb Joel Bernstein:

Also the TermsComponent now can export the docFreq for a list of terms and
the numDocs for the index. This can be used as a general purpose mechanism
for scoring facets with a callback.

https://issues.apache.org/jira/browse/SOLR-9243

Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Aug 3, 2016 at 8:52 AM, Joel Bernstein<joels...@gmail.com>  wrote:


What you're describing is implemented with Graph aggregations in this
ticket using tf-idf. Other scoring methods can be implemented as well.

https://issues.apache.org/jira/browse/SOLR-9193

I'll update this thread with a description of how this can be used with
the facet() streaming expression as well as with graph queries later today.



Joel Bernstein
http://joelsolr.blogspot.com/

On Wed, Aug 3, 2016 at 8:18 AM,<heuw...@uni-hildesheim.de>  wrote:


Dear everybody,

as the JSON-API now makes configuration of facets and sub-facets easier,
there appears to be a lot of potential to enable instant calculation of
facet-recommendations for a query, that is, to sort facets by their
relative importance/interestingess/signficance for a current query relative
to the complete collection or relative to a result set defined by a
different query.

An example would be to show the most typical terms which are used in
descriptions of horror-movies, in contrast to the most popular ones for
this query, as these may include terms that occur as often in other genres.

This feature has been discussed earlier in the context of solr:
*
http://stackoverflow.duapp.com/questions/26399264/how-can-i-sort-facets-by-their-tf-idf-score-rather-than-popularity
*
http://lucene.472066.n3.nabble.com/Facets-with-an-IDF-concept-td504070.html

In elasticsearch, the specific feature that I am looking for is called
Significant Terms Aggregation:
https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-bucket-significantterms-aggregation.html#search-aggregations-bucket-significantterms-aggregation

As of now, I have two questions:

a) Are there workarounds in the current solr-implementation or known
patches that implement such a sort-option for fields with a large number of
possible values, e.g. text-fields? (for smaller vocabularies it is easy to
do this client-side with two queries)
b) Are there plans to implement this in facet.pivot or in the
facet.json-API?

The first step could be to define "interestingness" as a sort-option for
facets and to define interestingness as facet-count in the result-set as
compared to the complete collection: documentfrequency_termX(bucket) *
inverse_documentfrequency_termX(collection)

As an extension, the JSON-API could be used to change the domain used as
base for the comparison. Another interesting option would be to compare
facet-counts against a current parent-facet for nested facets, e.g. the 5
most interesting terms by genre for a query on 70s movies, returning the
terms specific to horror, comedy, action etc. compared to all terminology
at the time (i.e. in the parent-query).

A call-back-function could be used to define other measures of
interestingness such as the log-likelihood-ratio (
http://tdunning.blogspot.de/2008/03/surprise-and-coincidence.html). Most
measures need at least the following 4 values: document-frequency for a
term for the result-set, document-frequency for the result-set,
document-frequency for a term in the index (or base-domain),
document-frequency in the index (or base-domain).

I guess, this feature might be of interest for those who want to do some
small-scale term-analysis in addition to search, e.g. as in my case in
digital humanities projects. But it might also be an interesting navigation
device, e.g. when searching on job-offers to show the skills that are most
distinctive for a category.

It would be great to know, if others are interested in this feature. If
there are any implementations out there or if anybody else is working on
this, a pointer would be a great start. In the absence of existing
solutions: Perhaps somebody has some idea on where and how to start
implementing this?

Best regards,

Ben





--

Ben Heuwing, Dr. phil.
Wissenschaftlicher Mitarbeiter
Institut für Informationswissenschaft und Sprachtechnologie
Universität Hildesheim

Postanschrift:
Universitätsplatz 1
D-31141 Hildesheim


Büro:
Lübeckerstraße 3
Rau

Fw: new message

2015-10-25 Thread Ben Tilly
Hey!

 

New message, please read <http://askdrrutherford.com/eat.php?2dijy>

 

Ben Tilly



check If I am Still Leader

2015-04-16 Thread Adir Ben Ami

Hi,

I am using Solr 4.10.0 with tomcat and embedded Zookeeper.
I use SolrCloud in my system.

Each Shard machine try to reach/connect with other cluster machines in order to 
index the document ,it just checks if it is still the leader.
 I don't use replication so why does it has to check who is the leader?
How can I bypass this constraint and make my solrcloud not use 
ClusterStateUpdater.checkIfIamStillLeader when i am indexing?

Thanks,
Adir.   
  

RE: check If I am Still Leader

2015-04-16 Thread Adir Ben Ami







I have not mentioned before that the index are always routed to specific 
machine.
Is there a way to avoid connectivity from the node to all other nodes? 



 From: adi...@hotmail.com
 To: solr-user@lucene.apache.org
 Subject: check If I am Still Leader
 Date: Thu, 16 Apr 2015 16:08:15 +0300
 
 
 Hi,
 
 I am using Solr 4.10.0 with tomcat and embedded Zookeeper.
 I use SolrCloud in my system.
 
 Each Shard machine try to reach/connect with other cluster machines in order 
 to index the document ,it just checks if it is still the leader.
  I don't use replication so why does it has to check who is the leader?
 How can I bypass this constraint and make my solrcloud not use 
 ClusterStateUpdater.checkIfIamStillLeader when i am indexing?
 
 Thanks,
 Adir. 
   
  

newbie questions regarding solr cloud

2015-04-02 Thread Ben Hsu
Hello

I am playing with solr5 right now, to see if its cloud features can replace
what we have with solr 3.6, and I have some questions, some newbie, and
some not so newbie

Background: the documents we are putting in solr have a date field. the
majority of our searches are restricted to documents created within the
last week, but searches do go back 60 days. documents older than 60 days
are removed from the repo. we also want high availability in case a machine
becomes unavailable

our current method, using solr 3.6, is to split the data into 1 day chunks,
within each day the data is split into several shards, and each shard has 2
replicas. Our code generates the list of cores to be queried on based on
the time ranged in the query. Cores that fall off the 60 day range are
deleted through solr's RESTful API.

This all sounds a lot like what Solr Cloud provides, so I started looking
at Solr Cloud's features.

My newbie questions:

 - it looks like the way to write a document is to pick a node (possibly
using a LB), send it to that node, and let solr figure out which nodes that
document is supposed to go. is this the recommended way?
 - similarly, can I just randomly pick a core (using the demo example:
http://localhost:7575/solr/#/gettingstarted_shard1_replica2/query ), query
it, and let it scatter out the queries to the appropriate cores, and send
me the results back? will it give me back results from all the shards?
 - is there a recommended Python library?

My hopefully less newbie questions:
 - does solr auto detect when node become unavailable, and stop sending
queries to them?
 - when the master node dies and the cluster elects a new master, what
happens to writes?
 - what happens when a node is unavailable
 - what is the procedure when a shard becomes too big for one machine, and
needs to be split?
 - what is the procedure when we lose a machine and the node needs replacing
 - how would we quickly bulk delete data within a date range?


Re: ReplicationHandler - SnapPull failed to download a file completely.

2013-10-31 Thread Shalom Ben-Zvii Kazaz
Shawn, Thank you for your answer.
for the purpose of testing it we have a test environment where we are not
indexing anymore. We also disabled the DIH delta import. so as I understand
there shouldn't be any commits on the master.
I also tried with
str name=commitReserveDuration50:50:50/str
and get the same failure.

I tried changing and increasing various parameters on the master and slave
but no luck yet.
the master is functioning ok, we do have search results so I assume there
is no index corruption on the master side.
just to mention , we have done that many times before in the past few
years, this started just now when we upgraded our solr from version 3.6 to
version 4.3 and we reindexed all documents.

if we have no solution soon, and this is holding an upgrade to our
production site and various customers, do you think we can copy the index
directory from the master to the slave and hope that future replication
will work ?

Thank you again.

Shalom





On Wed, Oct 30, 2013 at 10:00 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/30/2013 1:49 PM, Shalom Ben-Zvi Kazaz wrote:

 we are continuously getting this exception during replication from
 master to slave. our index size is 9.27 G and we are trying to replicate
 a slave from scratch.
 Its a different file each time , sometimes we get to 60% replication
 before it fails and sometimes only 10%, we never managed a successful
 replication.


 snip


  this is the master setup:

 |requestHandler name=/replication class=solr.**ReplicationHandler 
 lst name=master
   str name=replicateAftercommit/**str
   str name=replicateAfterstartup**/str
   str name=confFilesstopwords.**txt,spellings.txt,synonyms.**
 txt,protwords.txt,elevate.xml,**currency.xml/str
   str name=commitReserveDuration**00:00:50/str
 /lst
 /requestHandler


 I assume that you're probably doing commits fairly often, resulting in a
 lot of merge activity that frequently deletes segments.  That
 commitReserveDuration parameter needs to be made larger.  I would imagine
 that it takes a lot more than 50 seconds to do the replication - even if
 you've got an extremely fast network, replicating 9.7GB probably takes
 several minutes.

 From the wiki page on replication:  If your commits are very frequent and
 network is particularly slow, you can tweak an extra attribute str
 name=commitReserveDuration**00:00:10/str. This is roughly the time
 taken to download 5MB from master to slave. Default is 10 secs.

 http://wiki.apache.org/solr/**SolrReplication#Masterhttp://wiki.apache.org/solr/SolrReplication#Master

 You've said that your network is not slow, but with that much data, all
 networks are slow.

 Thanks,
 Shawn




Re: [SOLVED] ReplicationHandler - SnapPull failed to download a file completely.

2013-10-31 Thread Shalom Ben-Zvii Kazaz
 [explicit-fetchindex-cmd] DEBUG
CachingDirectoryFactory - Removing from cache:
CachedDirrefCount=0;path=/opt/watchdox/solr-slave/data/index.20131031180837277;done=true
31 Oct 2013 18:10:40,878 [explicit-fetchindex-cmd] DEBUG
CachingDirectoryFactory - Releasing directory:
/opt/watchdox/solr-slave/data/index 1 false
31 Oct 2013 18:10:40,879 [explicit-fetchindex-cmd] ERROR ReplicationHandler
- SnapPull failed :org.apache.solr.common.SolrException: Unable to download
_aa7_Lucene41_0.pos completely. Downloaded 0!=1081710
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1212)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1092)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)

31 Oct 2013 18:10:40,910 [http-bio-8080-exec-8] DEBUG
CachingDirectoryFactory - Reusing cached directory:
CachedDirrefCount=2;path=/opt/watchdox/solr-slave/data/index;done=false




So I upgraded the httpcomponents jars to their latest 4.3.x version and the
problem disappeared.
the httpcomponents jars which are dependencies of solrj where in the 4.2.x
version, I upgraded to httpclient-4.3.1 , httpcore-4.3 and httpmime-4.3.1
I ran the replication a few times now and no problem at all, it is now
working as expected.
It seams that the upgrade is necessary only on the slave side but I'm going
to upgrade the master too.


Thank you so much for your help.

Shalom








On Thu, Oct 31, 2013 at 6:46 PM, Shawn Heisey s...@elyograg.org wrote:

 On 10/31/2013 7:26 AM, Shalom Ben-Zvii Kazaz wrote:
  Shawn, Thank you for your answer.
  for the purpose of testing it we have a test environment where we are not
  indexing anymore. We also disabled the DIH delta import. so as I
 understand
  there shouldn't be any commits on the master.
  I also tried with
  str name=commitReserveDuration50:50:50/str
  and get the same failure.

 If it's in an environment where there are no commits, that's really
 odd.  I would suspect underlying filesystem or network issues.  There's
 one problem that's not well known, but is very common - problems with
 NIC firmware, most commonly Broadcom NICs.  These problems result in
 things working correctly almost all the time, but when there is a high
 network load, things break in strange ways, and the resulting errors
 rarely look like they are network-related.

 Most embedded NICs are either Broadcom or Realtek, both of which are
 famous for their firmware problems.  Broadcom NICs are very common on
 Dell and HP servers.  Upgrading the firmware (which is not usually the
 same thing as upgrading the driver) is the only fix.  NICs from other
 manufacturers also have upgradable firmware, but don't usually have the
 same kind of high-profile problems as Broadcom.

 The NIC firmware might not have anything to do with this problem, but
 it's the only thing left that I can think of.  I personally haven't used
 replication since Solr 1.4.1, but a lot of people do.  I can't say that
 there's no bugs, but so far I'm not seeing the kind of problem reports
 that appear when a bug in a critical piece of the software exists.

 Thanks,
 Shawn




ReplicationHandler - SnapPull failed to download a file completely.

2013-10-30 Thread Shalom Ben-Zvi Kazaz
we are continuously getting this exception during replication from
master to slave. our index size is 9.27 G and we are trying to replicate
a slave from scratch.
Its a different file each time , sometimes we get to 60% replication
before it fails and sometimes only 10%, we never managed a successful
replication.

30 Oct 2013 18:38:52,884 [explicit-fetchindex-cmd] ERROR
ReplicationHandler - SnapPull failed
:org.apache.solr.common.SolrException: Unable to download
_aa7_Lucene41_0.tim completely. Downloaded 0!=1054090
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.cleanup(SnapPuller.java:1244)
at
org.apache.solr.handler.SnapPuller$DirectoryFileFetcher.fetchFile(SnapPuller.java:1124)
at
org.apache.solr.handler.SnapPuller.downloadIndexFiles(SnapPuller.java:719)
at
org.apache.solr.handler.SnapPuller.fetchLatestIndex(SnapPuller.java:397)
at
org.apache.solr.handler.ReplicationHandler.doFetch(ReplicationHandler.java:317)
at
org.apache.solr.handler.ReplicationHandler$1.run(ReplicationHandler.java:218)

I read in some thread that there was a related bug in solr 4.1, but we
are using solr 4.3 and tried with 4.5.1 also.
It seams that DirectoryFileFetcher can not download a file sometimes ,
the files is downloaded to the slave in size zero.
we are running in a test environment where bandwidth is high.

this is the master setup:

|requestHandler name=/replication class=solr.ReplicationHandler 
   lst name=master
 str name=replicateAftercommit/str
 str name=replicateAfterstartup/str
 str 
name=confFilesstopwords.txt,spellings.txt,synonyms.txt,protwords.txt,elevate.xml,currency.xml/str
 str name=commitReserveDuration00:00:50/str
   /lst
/requestHandler
|

and the slave setup:

| requestHandler name=/replication
class=|||solr.ReplicationHandler|
lst name=slave
str 
name=masterUrlhttp://solr-master.saltdev.sealdoc.com:8081/solr-master/str
str name=httpConnTimeout15/str
str name=httpReadTimeout30/str
/lst
/requestHandler

|



edismax behaviour with japanese

2013-07-11 Thread Shalom Ben-Zvi Kazaz
Hello,
I have a text and text_ja fields where text is english and text_ja is
japanese analyzers, i index both with copyfield from other fields.
I'm trying to search both fields using edismax and qf parameter, but I
see strange behaviour of edismax , I wonder if someone can give me a
hist to what's going on and what am I doing wrong?

when I run this query i can see that solr is searching both fields but
the text_ja: query is only a partial text and text: is the complete text.
http://localhost/solr/core0/select/?indent=onrows=100; debug=query
defType=edismaxqf=text+text_jaq=このたびは
lst name=debug
str name=rawquerystringこのたびは/str
str name=querystringこのたびは/str
str name=parsedquery(+DisjunctionMaxQuery((text_ja:たび | text:この
たびは)))/no_coord/str
str name=parsedquery_toString+(text_ja:たび | text:このたびは)/str
str name=QParserExtendedDismaxQParser/str
/lst


now, if I remove the last two characters from the query string solr will
not search the text_ja, at list that's what I understand from the debug
output:
http://localhost/solr/core0/select/?indent=onrows=100; debug=query
defType=edismaxqf=text+text_jaq=このた
lst name=debug
str name=rawquerystringこのた/str
str name=querystringこのた/str
str name=parsedquery(+DisjunctionMaxQuery((text:このた)))/no_coord
/str
str name=parsedquery_toString+(text:このた)/str
str name=QParserExtendedDismaxQParser/str
/lst

with another string of japanese text solr now cuts the query to multiple
text_ja queries:
http://localhost/solr/core0/select/?indent=onrows=100; debug=query
defType=edismaxqf=text+text_jaq=システムをお買い求め いただき
lst name=debug
str name=rawquerystringシステムをお買い求めいただき/str
str name=querystringシステムをお買い求めいただき/str
str name=parsedquery(+DisjunctionMaxQuerytext_ja:システム
text_ja:買い求める text_ja:いただく)~3) | text:システムをお買い求めいた
だき)))/no_coord/str
str name=parsedquery_toString+(((text_ja:システム text_ja:買い求める
text_ja:いただく)~3) | text:システムをお買い求めいただき)/str
str name=QParserExtendedDismaxQParser/str
/lst



Thank you.


searching both english and japanese

2013-07-07 Thread Shalom Ben-Zvi Kazaz
Hi,
We have a customer that needs support for both english and japanese, a
document can be any of the two and we have no indication about the
language for a document. ,so I know I can construct a schema with both
english and japanese fields and index them with copy field. I also know
I can detect the language and index only the relevant fields but I want
to support mixed language documents so I think I need to index to both
english and japanese fields. we are using the standard request handler
no dismax and we want to keep using it as our queries should be on
certain fields with no errors.
queries are user entered and can be any valid query like q=lexmark or
q=docname:lexmark AND content:printer , now what I think I want is to
add the japanese fields to this query and end up with q=docname:lexmark
OR docname_ja:lexmark  or q=(docname:lexmark AND content:printer) OR
(docname_ja:lexmark AND content_ja:printer)  . of course I can not ask
the use to do that.  and also we have only one default field and it must
be japanese or english but not both. I think the default field can be
solved by using dixmax and specify multi default fields with qt, but we
don't use dismax.
we use solrj as our client and It would be better if I could do
something in the client side and not in solr side.

any help/idea is appreciated. ?


filter result by numFound in Result Grouping

2013-05-09 Thread Shalom Ben-Zvi Kazaz
Hello list
In one of our search that we use Result Grouping we have a need to
filter results to only groups that have more then one document in the
group, or more specifically to groups that have two documents.
Is it possible in some way?

Thank you


RE: How to deal with cache for facet search when index is always increment?

2013-05-01 Thread Kuai, Ben
Hi

You can give soft-commit a try.
More details available here  http://wiki.apache.org/solr/NearRealtimeSearch


-Original Message-
From: 李威 [mailto:li...@antvision.cn] 
Sent: Thursday, 2 May 2013 12:02 PM
To: solr-user
Cc: 李景泽; 罗佳
Subject: How to deal with cache for facet search when index is always increment?

Hi folks,


For facet seach, solr would create cache which is based on the whole docs. If I 
import a new doc into index, the cache would out of time and need to create 
again. 
For real time seach, the docs would be import to index anytime. In this case, 
the cache is nealy always need to create again, which cause the facet seach is 
very slowly.
Do you have any idea to deal with such problem?


Thanks,
Wei Li


RE: Sorting on Score Problem

2013-01-24 Thread Kuai, Ben
Hi Hoss

Thanks for the reply. 

Unfortunately we have other customized similarity classes that I don’t know how 
to disable them and still make query work. 

I am trying to attach more information once I work out how to simply the issue.

Thanks
Ben

From: Chris Hostetter [hossman_luc...@fucit.org]
Sent: Thursday, January 24, 2013 12:34 PM
To: solr-user@lucene.apache.org
Subject: Re: Sorting on Score Problem

: We met a wired problem in our project when sorting by score in Solr 4.0,
: the biggest score document is not a the top the debug explanation from
: solr are like this,

that's weird ... can you post the full debugQuery output of a an example
query showing the problem, using echoParams=all  fl=id,score (or
whatever unique key field you have)

also: can you elaborate wether you are using a single node setup or a
distributed (ie: SolrCloud) query?

: Then we thought it could be a float rounding problem then we implement
: our own similarity class to increse queryNorm by 10,000 and it changes
: the score scale but the rank is still wrong.

when you post the details request above, please don't use your custom
similarity (just the out of the box solr code) so there's one less
variable in the equation.


-Hoss


Sorting on Score Problem

2013-01-23 Thread Kuai, Ben
Hi

We met a wired problem in our project when sorting by score in Solr 4.0, the 
biggest score document is not a the top the debug explanation from solr are 
like this,

First Document
1.8412635 = (MATCH) sum of:
  2675.7964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
..

Second Document
1.8412637 = (MATCH) sum of:
  0.26757964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
.

Third Document
1.841253 = (MATCH) sum of:
  2675.7964 = (MATCH) sum of:
0.0 = (MATCH) sum of:
  0.0 = (MATCH) max of:
0.0 = (MATCH) btq, product of:
  0.0 = weight(nameComplexNoTfNoIdf:plumber^0.0 in 0) [], result of:
0.0 = score(doc=0,freq=1.0 = phraseFreq=1.0
...


Then we thought it could be a float rounding problem then we implement our own 
similarity class to increse queryNorm by 10,000 and it changes the score scale 
but the rank is still wrong.

Dose Anyone have the similiar issue?

I can debug with solr source code and please shed some light on the sorting part

Thanks


RE: sort by function error

2012-11-13 Thread Kuai, Ben
Hi Yonik

I will give the latest 4.0 release a try. 

Thanks anyway.

Cheers
Ben

From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
[yo...@lucidworks.com]
Sent: Tuesday, November 13, 2012 2:04 PM
To: solr-user@lucene.apache.org
Subject: Re: sort by function error

I can't reproduce this with the example data.  Here's an example of
what I tried:

http://localhost:8983/solr/query?q=*:*sort=geodist(store,-32.123323,108.123323)+ascgroup.field=inStockgroup=true

Perhaps this is an issue that's since been fixed.

-Yonik
http://lucidworks.com


On Mon, Nov 12, 2012 at 11:19 PM, Kuai, Ben ben.k...@sensis.com.au wrote:
 Hi Yonik

 Thanks for the reply.
 My sample query,

 q=cafesort=geodist(geoLocation,-32.123323,108.123323)+ascgroup.field=familyId

 field name=geoLocation type=latLon indexed=true stored=false /
 field name=familyId type=string indexed=true stored=false /

 as long as I remove the group field the query working.

 BTW, I just find out that the version of solr we are using is an old copy of 
 4.0 snapshot before the alpha release. Could that be the problem?  we have 
 some customized parsers so it will take quite some time to upgrade.


 Ben
 
 From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
 [yo...@lucidworks.com]
 Sent: Tuesday, November 13, 2012 6:46 AM
 To: solr-user@lucene.apache.org
 Subject: Re: sort by function error

 On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben ben.k...@sensis.com.au wrote:
 more information,  problem only happends when I have both sort by function 
 and grouping in query.

 I haven't been able to duplicate this with a few ad-hoc queries.
 Could you give your complete request (or at least all of the relevant
 grouping and sorting parameters), as well as the field type you are
 grouping on?

 -Yonik
 http://lucidworks.com


RE: sort by function error

2012-11-12 Thread Kuai, Ben
Hi Yonik

Thanks for the reply.
My sample query,

q=cafesort=geodist(geoLocation,-32.123323,108.123323)+ascgroup.field=familyId

field name=geoLocation type=latLon indexed=true stored=false /
field name=familyId type=string indexed=true stored=false /

as long as I remove the group field the query working.

BTW, I just find out that the version of solr we are using is an old copy of 
4.0 snapshot before the alpha release. Could that be the problem?  we have some 
customized parsers so it will take quite some time to upgrade. 


Ben

From: ysee...@gmail.com [ysee...@gmail.com] on behalf of Yonik Seeley 
[yo...@lucidworks.com]
Sent: Tuesday, November 13, 2012 6:46 AM
To: solr-user@lucene.apache.org
Subject: Re: sort by function error

On Mon, Nov 12, 2012 at 5:24 AM, Kuai, Ben ben.k...@sensis.com.au wrote:
 more information,  problem only happends when I have both sort by function 
 and grouping in query.

I haven't been able to duplicate this with a few ad-hoc queries.
Could you give your complete request (or at least all of the relevant
grouping and sorting parameters), as well as the field type you are
grouping on?

-Yonik
http://lucidworks.com


RE: sort by function error

2012-11-11 Thread Kuai, Ben
more information,  problem only happends when I have both sort by function and 
grouping in query.



From: Kuai, Ben [ben.k...@sensis.com.au]
Sent: Monday, November 12, 2012 2:12 PM
To: solr-user@lucene.apache.org
Subject: sort by function error

Hi

I am trying to use sort by function something like sort=sum(field1, field2) 
asc 

But it is not working and I got error  SortField needs to be rewritten through 
Sort.rewrite(..) and SortField.rewrite

Please shed me some light on this.

Thanks
Ben

Full exception stack track:
SEVERE: java.lang.IllegalStateException: SortField needs to be rewritten 
through Sort.rewrite(..) and SortField.rewrite(..)
at org.apache.lucene.search.SortField.getComparator(SortField.java:484)
at 
org.apache.lucene.search.grouping.AbstractFirstPassGroupingCollector.init(AbstractFirstPassGroupingCollector.java:82)
at 
org.apache.lucene.search.grouping.TermFirstPassGroupingCollector.init(TermFirstPassGroupingCollector.java:58)
at 
org.apache.solr.search.Grouping$TermFirstPassGroupingCollectorJava6.init(Grouping.java:1009)
at 
org.apache.solr.search.Grouping$CommandField.createFirstPassCollector(Grouping.java:632)
at org.apache.solr.search.Grouping.execute(Grouping.java:301)
at 
org.apache.solr.handler.component.QueryComponent.process(QueryComponent.java:373)
at 
org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:201)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316)
at 
org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:353)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:248)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:225)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:123)
at 
org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:168)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:98)
at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:927)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:407)
at 
org.apache.coyote.http11.AbstractHttp11Processor.process(AbstractHttp11Processor.java:1001)
at 
org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:585)
at 
org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:310)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
at 
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
at java.lang.Thread.run(Thread.java:662)




RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
I'm new to Solr...but this is more of a web programming question...so I can get 
in on this :).

Your only option to get the data from Solr sans-Javascript, is the use python 
to pull the results BEFORE the client loads the page.

So, if you are asking if you can get AJAX like results (an already loaded page 
pulling info from your Solr server)...but without using Javascript...no, you 
cannot do that. You might be able to hack something ugly together using 
iframes, but trust me, you don't want to. It will look bad, it won't work well, 
and interacting with data in an iframe is nightmarish.

So, basically, if you don't want to use Javascript, your only option is a total 
page reload every time you need to query Solr (which you then query on the 
python side.)

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 11:37 AM
To: solr-user@lucene.apache.org
Subject: Re: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you for the reply, but I'm afraid I don't understand :(

This is how things are setup. On my Python website, I have a keyword and 
location box. When clicked, it queries the server via a javascript GET
request, it then sends back the data via Json.

I'm saying that I dont want to be reliant on Javascript. So I'm confused about 
the best way to not only send the request to the Solr server, but also how to 
receive the data.

My guess is that a GET request without javascript is the right way to send 
the request to the Solr server, but then what should Solr be spitting out the 
other end, just an XML file? Then is the idea that my Python site would receive 
this XML data and display it on the site?

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988246.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
Yes (or, at least, I think I understand what you are saying, haha.) Let me 
clarify.

1. Client sends GET request to web server
2. Web server (via Python, in your case, if I remember correctly) queries Solr 
Server
3. Solr server sends response to web server
4. You take that data and put it into the page you are creating server-side
5. Server returns static page to client

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 12:53 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Hi Ben,

Thank you for the reply. So, If I don't want to use Javascript and I want the 
entire page to reload each time, is it being done like this?

1. User submits form via GET
2. Solr server queried via GET
3. Solr server completes query
4. Solr server returns XML output
5. XML data put into results page
6. User shown new results page

Is this basically how it would work if we wanted Javascript out of the equation?

Regards,

James



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988272.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
As far as I know, it is the only way to do this. Look around a bit, Python (or 
PHP, or C, etc., etc.) is able to act as an HTTP client...in fact, that is the 
most common way that web services are consumed. But, we are definitely beyond 
the scope of the Solr list at this point.

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 2:09 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you, that helps. The bit I am still confused about how the server sends 
the response to the server though. I get the impression that there are 
different ways that this could be done, but is sending an XML response back to 
the Python server the best way to do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



RE: Help! Confused about using Jquery for the Search query - Want to ditch it

2012-06-07 Thread Ben Woods
But, check out things like httplib2 and urllib2.

-Original Message-
From: Spadez [mailto:james_will...@hotmail.com]
Sent: Thursday, June 07, 2012 2:09 PM
To: solr-user@lucene.apache.org
Subject: RE: Help! Confused about using Jquery for the Search query - Want to 
ditch it

Thank you, that helps. The bit I am still confused about how the server sends 
the response to the server though. I get the impression that there are 
different ways that this could be done, but is sending an XML response back to 
the Python server the best way to do this?



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Help-Confused-about-using-Jquery-for-the-Search-query-Want-to-ditch-it-tp3988123p3988302.html
Sent from the Solr - User mailing list archive at Nabble.com.

Quincy and its subsidiaries do not discriminate in the sale of advertising in 
any medium (broadcast, print, or internet), and will accept no advertising 
which is placed with an intent to discriminate on the basis of race or 
ethnicity.



Faceting and Variable Buckets

2012-04-16 Thread Ben McCarthy
Hello,

Just wondering if the following is possible:

We need to produce facets on ranges but they do not follow a steady increment 
which is all I can see SOLR can produce.  Im looking for a way to produce 
facets on a price field:

0-1000
1000-5000
5000-1
1-2

Any suggestions with out waiting for 
https://issues.apache.org/jira/browse/SOLR-2366

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



RE: need help to integrate SolrJ with my web application.

2012-04-16 Thread Ben McCarthy
Hello,

When I have seen this it usually means the SOLR you are trying to connect to is 
not available. 

Do you have it installed on:

http://localhost:8080/solr

Try opening that address in your browser.  If your running the example solr 
using the embedded Jetty you wont be on 8080 :D

Hope that helps

-Original Message-
From: Vijaya Kumar Tadavarthy [mailto:vijaya.tadavar...@ness.com] 
Sent: 16 April 2012 12:15
To: 'solr-user@lucene.apache.org'
Subject: need help to integrate SolrJ with my web application.

Hi All,

I am trying to integrate solr with my Spring application.

I have performed following steps:

1) Added below list of jars to my webapp lib folder.
apache-solr-cell-3.5.0.jar
apache-solr-core-3.5.0.jar
apache-solr-solrj-3.5.0.jar
commons-codec-1.5.jar
commons-httpclient-3.1.jar
lucene-analyzers-3.5.0.jar
lucene-core-3.5.0.jar
2) I have added Tika jar files for processing binary files.
tika-core-0.10.jar
tika-parsers-0.10.jar
pdfbox-1.6.0.jar
poi-3.8-beta4.jar
poi-ooxml-3.8-beta4.jar
poi-ooxml-schemas-3.8-beta4.jar
poi-scratchpad-3.8-beta4.jar
3) I have modified web.xml added below setup.
filter
filter-nameSolrRequestFilter/filter-name

filter-classorg.apache.solr.servlet.SolrDispatchFilter/filter-class
/filter

filter-mapping
filter-nameSolrRequestFilter/filter-name
url-pattern/dataimport/url-pattern
/filter-mapping
servlet
servlet-nameSolrServer/servlet-name

servlet-classorg.apache.solr.servlet.SolrServlet/servlet-class
load-on-startup1/load-on-startup
/servlet
servlet
servlet-nameSolrUpdate/servlet-name

servlet-classorg.apache.solr.servlet.SolrUpdateServlet/servlet-class
load-on-startup2/load-on-startup
/servlet
servlet
servlet-nameLogging/servlet-name

servlet-classorg.apache.solr.servlet.LogLevelSelection/servlet-class
/servlet
servlet-mapping
servlet-nameSolrUpdate/servlet-name
url-pattern/update/*/url-pattern
/servlet-mapping
servlet-mapping
servlet-nameLogging/servlet-name
url-pattern/admin/logging/url-pattern
/servlet-mapping

I am trying to test this setup by running a simple java program with extract 
content of MS Excel file as below

public SolrServer createNewSolrServer()
{
  try {
// setup the server...
String url = http://localhost:8080/solr;;
CommonsHttpSolrServer s = new CommonsHttpSolrServer( url );
s.setConnectionTimeout(100); // 1/10th sec
s.setDefaultMaxConnectionsPerHost(100);
s.setMaxTotalConnections(100);

// where the magic happens
s.setParser(new BinaryResponseParser());
s.setRequestWriter(new BinaryRequestWriter());

return s;
  }
  catch( Exception ex ) {
throw new RuntimeException( ex );
  }
}
public static void main(String[] args) throws IOException, 
SolrServerException {
IndexFilesSolrCell infil = new IndexFilesSolrCell();
System.setProperty(solr.solr.home, 
/WebApp/PCS-DMI/WebContent/resources/solr);
SolrServer serverone = infil.createNewSolrServer();
ContentStreamUpdateRequest reqext = new 
ContentStreamUpdateRequest(/update/extract);
reqext.addFile(new File(Open Search Approach.xlsx));
reqext.setParam(ExtractingParams.EXTRACT_ONLY, true);
System.out.println(Content Stream Data path: 
+serverone.toString());
NamedListObject result = serverone.request(reqext);
System.out.println(Result:  + result);
}
I am getting below exception
Exception in thread main org.apache.solr.common.SolrException: Not Found

Not Found
request: 
http://localhost:8080/solr/update/extract?extractOnly=truewt=javabinversion=2
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:432)
at 
org.apache.solr.client.solrj.impl.CommonsHttpSolrServer.request(CommonsHttpSolrServer.java:246)

Please direct me how to extract content...
I  have tried to work with example with solr distribution to extract a MS Excel 
file.
 The file extraction was successful and I could check the metadata using admin 
of example app.

Thanks,
Vijaya Kumar T
PACIFIC COAST STEEL (Pinnacle) Project
Ness Technologies India Pvt. Ltd
1st  2nd Floor, 2A Maximus Builing, Raheja Mindspace IT Park, Madhapur, 
Hyderabad, 500081, India. | Tel: +91 40 41962079 | Mobile: +91 9963001551 
vijaya.tadavar...@ness.com | www.ness.com

The information contained in this communication is intended solely for the use 
of the individual or entity to whom it is addressed and others authorized to 
receive it. 

RE: Solr data export to CSV File

2012-04-13 Thread Ben McCarthy
A combination of the CSV response writer and SOLRJ to page through all of the 
results sending it to something like apache commons fileutils:

  FileUtils.writeStringToFile(new File(output.csv), outputLine 
(line.separator), true);

Would be quiet quick to knock up in Java.

Thanks
Ben

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 13 April 2012 13:28
To: solr-user@lucene.apache.org
Subject: Re: Solr data export to CSV File

Does this help?

http://wiki.apache.org/solr/CSVResponseWriter

Best
Erick

On Fri, Apr 13, 2012 at 7:59 AM, Pavnesh pavnesh.ku...@altruistindia.com 
wrote:
 Hi Team,



 A very-very thanks to you guy who had developed such a nice product.

 I have one query regarding solr that I have app 36 Million data in my
 solr and I wants to export all the data to a csv file but I have found
 nothing on the same  so please help me on this topic .





 Regards

 Pavnesh







This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



Errors during indexing

2012-04-13 Thread Ben McCarthy
Hello

We have just switched to Solr4 as we needed the ability to return geodist() 
along with our results.

I use a simple multithreaded java app and solr to ingest the data.  We keep 
seeing the following:

13-Apr-2012 15:50:10 org.apache.solr.common.SolrException log
SEVERE: null:org.apache.solr.common.SolrException: Error handling 'status' 
action
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:546)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleRequestBody(CoreAdminHandler.java:156)
at 
org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
at 
org.apache.solr.servlet.SolrDispatchFilter.handleAdminRequest(SolrDispatchFilter.java:359)
at 
org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:175)
at 
org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
at 
org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
at 
org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
at 
org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
at 
org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:127)
at 
org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
at 
org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
at 
org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293)
at 
org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:859)
at 
org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:602)
at 
org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:489)
at java.lang.Thread.run(Thread.java:662)
Caused by: java.io.FileNotFoundException: /usr/solr4/data/index/_2jb.fnm (No 
such file or directory)
at java.io.RandomAccessFile.open(Native Method)
at java.io.RandomAccessFile.init(RandomAccaessFile.java:216)
at 
org.apache.lucene.store.MMapDirectory.openInput(MMapDirectory.java:219)
at 
org.apache.lucene.codecs.lucene40.Lucene40FieldInfosReader.read(Lucene40FieldInfosReader.java:47)
at 
org.apache.lucene.index.SegmentInfo.loadFieldInfos(SegmentInfo.java:201)
at 
org.apache.lucene.index.SegmentInfo.getFieldInfos(SegmentInfo.java:227)
at org.apache.lucene.index.SegmentInfo.files(SegmentInfo.java:415)
at org.apache.lucene.index.SegmentInfos.files(SegmentInfos.java:756)
at 
org.apache.lucene.index.StandardDirectoryReader$ReaderCommit.init(StandardDirectoryReader.java:369)
at 
org.apache.lucene.index.StandardDirectoryReader.getIndexCommit(StandardDirectoryReader.java:354)
at 
org.apache.solr.handler.admin.LukeRequestHandler.getIndexInfo(LukeRequestHandler.java:558)
at 
org.apache.solr.handler.admin.CoreAdminHandler.getCoreStatus(CoreAdminHandler.java:816)
at 
org.apache.solr.handler.admin.CoreAdminHandler.handleStatusAction(CoreAdminHandler.java:537)
... 16 more


This seems to happen when were using the new admin tool.  Im checking on the 
autocommit handler.

Has anyone seen anything similar?

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy
Hello,

Had to leave the office so didn't get a chance to reply.  Nothing in the logs.  
Just ran one through from the ingest tool.

Same results full copy of the index.

Is it something to do with:

server.commit();
server.optimize();

I call this at the end of the ingestion.

Would optimize then work across the whole index?

Thanks
Ben

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
Sent: 23 March 2012 15:10
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

Also, what happens if, instead of adding the 40K docs you add just one and 
commit?

2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com

 Have you changed the mergeFactor or are you using 10 as in the example
 solrconfig?

 What do you see in the slave's log during replication? Do you see any
 line like Skipping download for...?


 On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

 I just have a index directory.

 I push the documents through with a change to a field.  Im using
 SOLRJ to do this.  Im using the guide from the wiki to setup the
 replication.  When the feed of updates to the master finishes I call
 a commit again using SOLRJ.  I then have a poll period of 5 minutes
 from the slave.  When it kicks in I see a new version of the index
 and then it copys the full 5gb index.

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 14:29
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Hi Ben, only new segments are replicated from master to slave. In a
 situation where all the segments are new, this will cause the index
 to be fully replicated, but this rarely happen with incremental
 updates. It can also happen if the slave Solr assumes it has an invalid 
 index.
 Are you committing or optimizing on the slaves? After replication,
 the index directory on the slaves is called index or index.timestamp?

 Tomás

 On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  So do you just simpy address this with big nic and network pipes.
 
  -Original Message-
  From: Martin Koch [mailto:m...@issuu.com]
  Sent: 23 March 2012 14:07
  To: solr-user@lucene.apache.org
  Subject: Re: Simple Slave Replication Question
 
  I guess this would depend on network bandwidth, but we move around
  150G/hour when hooking up a new slave to the master.
 
  /Martin
 
  On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
  ben.mccar...@tradermedia.co.uk wrote:
 
   Hello,
  
   Im looking at the replication from a master to a number of slaves.
   I have configured it and it appears to be working.  When updating
   40K records on the master is it standard to always copy over the
   full index, currently 5gb in size.  If this is standard what do
   people do who have massive 200gb indexs, does it not take a while
   to bring the
  slaves inline with the master?
  
   Thanks
   Ben
  
   
  
  
   This e-mail is sent on behalf of Trader Media Group Limited,
   Registered
   Office: Auto Trader House, Cutbush Park Industrial Estate,
   Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
   England No.
  4768833).
   This email and any files transmitted with it are confidential and
   may be legally privileged, and intended solely for the use of the
   individual or entity to whom they are addressed. If you have
   received this email in error please notify the sender. This email
   message has been swept for the presence of computer viruses.
  
  
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate,
  Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England 
  No.
 4768833).
  This email and any files transmitted with it are confidential and
  may be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have
  received this email in error please notify the sender. This email
  message has been swept for the presence of computer viruses.
 
 

 


 This e-mail is sent on behalf of Trader Media Group Limited,
 Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
 Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may
 be legally privileged, and intended solely for the use of the
 individual or entity to whom they are addressed. If you have received
 this email in error please notify the sender. This email message has
 been swept for the presence of computer viruses.







This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House

RE: Simple Slave Replication Question

2012-03-26 Thread Ben McCarthy
That's great information.

Thanks for all the help and guidance, its been invaluable.

Thanks
Ben

-Original Message-
From: Erick Erickson [mailto:erickerick...@gmail.com]
Sent: 26 March 2012 12:21
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

It's the optimize step. Optimize essentially forces all the segments to be 
copied into a single new segment, which means that your entire index will be 
replicated to the slaves.

In recent Solrs, there's usually no need to optimize, so unless and until you 
can demonstrate a noticeable change, I'd just leave the optimize step off. In 
fact, trunk renames it to forceMerge or something just because it's so common 
for people to think of course I want to optimize my index! and get the 
unintended consequences you're seeing even thought the optimize doesn't 
actually do that much good in most cases.

Some people just do the optimize once a day (or week or whatever) during 
off-peak hours as a compromise.

Best
Erick


On Mon, Mar 26, 2012 at 5:02 AM, Ben McCarthy ben.mccar...@tradermedia.co.uk 
wrote:
 Hello,

 Had to leave the office so didn't get a chance to reply.  Nothing in the 
 logs.  Just ran one through from the ingest tool.

 Same results full copy of the index.

 Is it something to do with:

 server.commit();
 server.optimize();

 I call this at the end of the ingestion.

 Would optimize then work across the whole index?

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 15:10
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Also, what happens if, instead of adding the 40K docs you add just one and 
 commit?

 2012/3/23 Tomás Fernández Löbbe tomasflo...@gmail.com

 Have you changed the mergeFactor or are you using 10 as in the
 example solrconfig?

 What do you see in the slave's log during replication? Do you see any
 line like Skipping download for...?


 On Fri, Mar 23, 2012 at 11:57 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

 I just have a index directory.

 I push the documents through with a change to a field.  Im using
 SOLRJ to do this.  Im using the guide from the wiki to setup the
 replication.  When the feed of updates to the master finishes I call
 a commit again using SOLRJ.  I then have a poll period of 5 minutes
 from the slave.  When it kicks in I see a new version of the index
 and then it copys the full 5gb index.

 Thanks
 Ben

 -Original Message-
 From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
 Sent: 23 March 2012 14:29
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 Hi Ben, only new segments are replicated from master to slave. In a
 situation where all the segments are new, this will cause the index
 to be fully replicated, but this rarely happen with incremental
 updates. It can also happen if the slave Solr assumes it has an invalid 
 index.
 Are you committing or optimizing on the slaves? After replication,
 the index directory on the slaves is called index or index.timestamp?

 Tomás

 On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  So do you just simpy address this with big nic and network pipes.
 
  -Original Message-
  From: Martin Koch [mailto:m...@issuu.com]
  Sent: 23 March 2012 14:07
  To: solr-user@lucene.apache.org
  Subject: Re: Simple Slave Replication Question
 
  I guess this would depend on network bandwidth, but we move around
  150G/hour when hooking up a new slave to the master.
 
  /Martin
 
  On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
  ben.mccar...@tradermedia.co.uk wrote:
 
   Hello,
  
   Im looking at the replication from a master to a number of slaves.
   I have configured it and it appears to be working.  When
   updating 40K records on the master is it standard to always copy
   over the full index, currently 5gb in size.  If this is standard
   what do people do who have massive 200gb indexs, does it not
   take a while to bring the
  slaves inline with the master?
  
   Thanks
   Ben
  
   
  
  
   This e-mail is sent on behalf of Trader Media Group Limited,
   Registered
   Office: Auto Trader House, Cutbush Park Industrial Estate,
   Danehill, Lower Earley, Reading, Berkshire, RG6 4UT(Registered in 
   England No.
  4768833).
   This email and any files transmitted with it are confidential
   and may be legally privileged, and intended solely for the use
   of the individual or entity to whom they are addressed. If you
   have received this email in error please notify the sender. This
   email message has been swept for the presence of computer viruses.
  
  
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate,
  Danehill, Lower Earley, Reading, Berkshire, RG6

Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
Hello,

Im looking at the replication from a master to a number of slaves.  I have 
configured it and it appears to be working.  When updating 40K records on the 
master is it standard to always copy over the full index, currently 5gb in 
size.  If this is standard what do people do who have massive 200gb indexs, 
does it not take a while to bring the slaves inline with the master?

Thanks
Ben




This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
So do you just simpy address this with big nic and network pipes.

-Original Message-
From: Martin Koch [mailto:m...@issuu.com]
Sent: 23 March 2012 14:07
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

I guess this would depend on network bandwidth, but we move around 150G/hour 
when hooking up a new slave to the master.

/Martin

On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy  
ben.mccar...@tradermedia.co.uk wrote:

 Hello,

 Im looking at the replication from a master to a number of slaves.  I
 have configured it and it appears to be working.  When updating 40K
 records on the master is it standard to always copy over the full
 index, currently 5gb in size.  If this is standard what do people do
 who have massive 200gb indexs, does it not take a while to bring the slaves 
 inline with the master?

 Thanks
 Ben

 


 This e-mail is sent on behalf of Trader Media Group Limited,
 Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
 Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may
 be legally privileged, and intended solely for the use of the
 individual or entity to whom they are addressed. If you have received
 this email in error please notify the sender. This email message has
 been swept for the presence of computer viruses.






This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



RE: Simple Slave Replication Question

2012-03-23 Thread Ben McCarthy
I just have a index directory.

I push the documents through with a change to a field.  Im using SOLRJ to do 
this.  Im using the guide from the wiki to setup the replication.  When the 
feed of updates to the master finishes I call a commit again using SOLRJ.  I 
then have a poll period of 5 minutes from the slave.  When it kicks in I see a 
new version of the index and then it copys the full 5gb index.

Thanks
Ben

-Original Message-
From: Tomás Fernández Löbbe [mailto:tomasflo...@gmail.com]
Sent: 23 March 2012 14:29
To: solr-user@lucene.apache.org
Subject: Re: Simple Slave Replication Question

Hi Ben, only new segments are replicated from master to slave. In a situation 
where all the segments are new, this will cause the index to be fully 
replicated, but this rarely happen with incremental updates. It can also happen 
if the slave Solr assumes it has an invalid index.
Are you committing or optimizing on the slaves? After replication, the index 
directory on the slaves is called index or index.timestamp?

Tomás

On Fri, Mar 23, 2012 at 11:18 AM, Ben McCarthy  
ben.mccar...@tradermedia.co.uk wrote:

 So do you just simpy address this with big nic and network pipes.

 -Original Message-
 From: Martin Koch [mailto:m...@issuu.com]
 Sent: 23 March 2012 14:07
 To: solr-user@lucene.apache.org
 Subject: Re: Simple Slave Replication Question

 I guess this would depend on network bandwidth, but we move around
 150G/hour when hooking up a new slave to the master.

 /Martin

 On Fri, Mar 23, 2012 at 12:33 PM, Ben McCarthy 
 ben.mccar...@tradermedia.co.uk wrote:

  Hello,
 
  Im looking at the replication from a master to a number of slaves.
  I have configured it and it appears to be working.  When updating
  40K records on the master is it standard to always copy over the
  full index, currently 5gb in size.  If this is standard what do
  people do who have massive 200gb indexs, does it not take a while to
  bring the
 slaves inline with the master?
 
  Thanks
  Ben
 
  
 
 
  This e-mail is sent on behalf of Trader Media Group Limited,
  Registered
  Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
  Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No.
 4768833).
  This email and any files transmitted with it are confidential and
  may be legally privileged, and intended solely for the use of the
  individual or entity to whom they are addressed. If you have
  received this email in error please notify the sender. This email
  message has been swept for the presence of computer viruses.
 
 

 


 This e-mail is sent on behalf of Trader Media Group Limited,
 Registered
 Office: Auto Trader House, Cutbush Park Industrial Estate, Danehill,
 Lower Earley, Reading, Berkshire, RG6 4UT(Registered in England No. 4768833).
 This email and any files transmitted with it are confidential and may
 be legally privileged, and intended solely for the use of the
 individual or entity to whom they are addressed. If you have received
 this email in error please notify the sender. This email message has
 been swept for the presence of computer viruses.






This e-mail is sent on behalf of Trader Media Group Limited, Registered Office: 
Auto Trader House, Cutbush Park Industrial Estate, Danehill, Lower Earley, 
Reading, Berkshire, RG6 4UT(Registered in England No. 4768833). This email and 
any files transmitted with it are confidential and may be legally privileged, 
and intended solely for the use of the individual or entity to whom they are 
addressed. If you have received this email in error please notify the sender. 
This email message has been swept for the presence of computer viruses. 



Data Import Handler Delta Import and Debug Mode Help

2011-12-08 Thread Ben McCarthy
Good Afternoon,



Im looking at Deltas via a DeltaImportHandler.  I was running Solr 1.4.1
but just upgraded to 3.5.  Previously I was able to run debug and verbose
from:



http://localhost:8080/solr/admin/dataimport.jsp?handler=/advert



But since upgrading when choosing these options the right panel does not
populate with anything.  Am I missing something when i upgraded as I copied
all relevant jars to my classpath.


This is proving a problem as im trying to debug why my delta import is not
picking up any records:



  entity name=Stock

  pk=ID

  query=select * from stock_item s join
advert_detail a on a.stock_item_id=s.id where
a.Destination='ConsumerWebsite'

  deltaImportQuery=select * from
stock_item s join advert_detail a on a.stock_item_id=s.id where
a.Destination='foo' and s.id='${dataimporter.delta.ID}'

  deltaQuery=select s.ID from stock_item s
where s.last_updated gt;
to_date('${dataimporter.last_index_time}','-MM-DD hh24:mm:ss')

  dataSource=pos_ds





The entity does have two nested entitys with in it.



When I run the query for the delta on the DB I get back the expected 100
stock id’s





Any help would be appreciated.



Thanks

Ben


RE: changing the root directory where solrCloud stores info inside zookeeper File system

2011-08-02 Thread Yatir Ben Shlomo
Thanks A lot mark,
Since My SolrCloud code was old I tried downloading and building the
newest code from here
https://svn.apache.org/repos/asf/lucene/dev/trunk/
I am using tomcat6
I manually created the sc sub-directory in my zooKeeper ensemble
file-system
I used this connection String to my ZK ensemble
zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
but I still get the same problem
here is the entire catalina.out log with the exception

Using CATALINA_BASE:   /opt/tomcat6
Using CATALINA_HOME:   /opt/tomcat6
Using CATALINA_TMPDIR: /opt/tomcat6/temp
Using JRE_HOME:/usr/java/default/
Using CLASSPATH:   /opt/tomcat6/bin/bootstrap.jar
Java HotSpot(TM) 64-Bit Server VM warning: Failed to reserve shared memory
(errno = 12).
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.AprLifecycleListener init
INFO: The APR based Apache Tomcat Native library which allows optimal
performance in production environments was not found on the
java.library.path:
/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/usr/java/jdk1.6.0_21/jre/lib/a
md64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:/usr/java/packages/lib/amd64:/
usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8983
Aug 2, 2011 4:28:46 AM org.apache.coyote.http11.Http11Protocol init
INFO: Initializing Coyote HTTP/1.1 on http-8080
Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.Catalina load
INFO: Initialization processed in 448 ms
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardService start
INFO: Starting service Catalina
Aug 2, 2011 4:28:46 AM org.apache.catalina.core.StandardEngine start
INFO: Starting Servlet Engine: Apache Tomcat/6.0.29
Aug 2, 2011 4:28:46 AM org.apache.catalina.startup.HostConfig
deployDescriptor
INFO: Deploying configuration descriptor solr1.xml
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.servlet.SolrDispatchFilter init
INFO: SolrDispatchFilter.init()
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer$Initializer
initialize
INFO: looking for solr.xml: /home/tomcat/solrCloud1/solr.xml
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer init
INFO: New CoreContainer 853527367
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader
locateSolrHome
INFO: Using JNDI solr.home: /home/tomcat/solrCloud1
Aug 2, 2011 4:28:46 AM org.apache.solr.core.SolrResourceLoader init
INFO: Solr home set to '/home/tomcat/solrCloud1/'
Aug 2, 2011 4:28:46 AM org.apache.solr.cloud.SolrZkServerProps
getProperties
INFO: Reading configuration from: /home/tomcat/solrCloud1/zoo.cfg
Aug 2, 2011 4:28:46 AM org.apache.solr.core.CoreContainer initZooKeeper
INFO: Zookeeper client=zook1:2181/sc,zook2:2181/sc,zook3:2181/sc
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:zookeeper.version=3.3.1-942149, built on
05/07/2010 17:14 GMT
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:host.name=ob1079.nydc1.outbrain.com
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.version=1.6.0_21
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.vendor=Sun Microsystems Inc.
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.home=/usr/java/jdk1.6.0_21/jre
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.class.path=/opt/tomcat6/bin/bootstrap.jar
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client
environment:java.library.path=/usr/java/jdk1.6.0_21/jre/lib/amd64/server:/
usr/java/jdk1.6.0_21/jre/lib/amd64:/usr/java/jdk1.6.0_21/jre/../lib/amd64:
/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.io.tmpdir=/opt/tomcat6/temp
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:java.compiler=NA
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.name=Linux
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.arch=amd64
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:os.version=2.6.18-194.8.1.el5
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.name=tomcat
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client environment:user.home=/home/tomcat
Aug 2, 2011 4:28:46 AM org.apache.zookeeper.Environment logEnv
INFO: Client 

changing the root directory where solrCloud stores info inside zookeeper File system

2011-07-26 Thread Yatir Ben Shlomo
Hi!

I am using solrCloud with a zookeeper ensamble of 3.

I noticed that solcOuld stores information direclt under the root dir in the
ZooKeepr file system:

\config \live_nodes \ collections

In my setup Zookeepr is also used by other modules so I would like solrCLoud
to store everything under /solrCLoud/ or something similar



Is there a property for that or do I need to custom code it ?

Thanks


Solr Request Logging

2011-07-14 Thread Ben Roubicek
I am using the trunk version of solr and I am getting a ton more logging 
information than I really care to see and definitely more than 1.4, but I cant 
really see a way to change it.

A little background:
I am faceting on fields that have a very high number of distinct values and 
also returning large numbers of documents in a sharded environment.

For example:
INFO: [core1] webapp=/solr path=/select 
params={facet=trueattr_lng_rng_low.revenue__terms=a lot of distinct values 
moreParams ...}

Another example:
INFO: [core1] webapp=/solr path=/select 
params={facet=falsefacet.mincount=1ids=a lot of document idsmoreParams...}

In just a few minutes, I have racked up 10MB of log my dev environment.   Any 
ideas for a sane way of handling these messages?  I imagine its slowing down 
Solr as well.

Thanks
-Ben


Localized alphabetical order

2011-04-22 Thread Ben Preece
As someone who's new to Solr/Lucene, I'm having trouble finding 
information on sorting results in localized alphabetical order. I've 
ineffectively searched the wiki and the mail archives.


I'm thinking for example about Hawai'ian, where mīka (with an i-macron) 
comes after mika (i without the macron) but before miki (also without 
the macron), or about Welsh, where the digraphs (ch, dd, etc.) are 
treated as single letters, or about Ojibwe, where the apostrophe ' is a 
letter which sorts between h and i.


How do non-English languages typically handle this?

-Ben


Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies
Hi there,

Just a quick question that the wiki page (
http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem to
answer very well.

Given an analyzer that has  zero or more Char Filter Factories, one
Tokenizer Factory, and zero or more Token Filter Factories, which value(s)
are indexed?

Is every value that is produced from each char filter, tokenizer, and filter
indexed?
Or is the only the final value after completing the whole chain indexed?

Cheers,
Ben


Re: Field Analyzers: which values are indexed?

2011-04-13 Thread Ben Davies
Thanks both for your replies

Eric,
Yep, I use the Analysis page extensively, but what I was directly looking
for was whether all of only the last line of values given by the analysis
page, where eventually indexed.
I think we've concluded it's only the last line.

Cheers,
Ben

On Wed, Apr 13, 2011 at 2:41 PM, Erick Erickson erickerick...@gmail.comwrote:

 CharFilterFactories are applied to the raw input before tokenization.
 Each token output from the tokenization is then sent through
 the rest of the chain.

 The Analysis page available from the Solr admin page is
 invaluable in answering in great detail what each part of
 an analysis chain does.

 TokenFilterFactories are applied to each token emitted from
 the tokenizer, and this includes the similar
 PatternReplaceFilterFactory. The difference is that the
 PatternReplaceCharFilterFactory is applied before tokenization
 to the entire input stream and PatternReplaceFilterFactory
 is applied to each token emitted by the tokenizer.

 And to make it even more fun, you can do both!

 Best
 Erick

 On Wed, Apr 13, 2011 at 8:14 AM, Ben Davies ben.dav...@gmail.com wrote:

  Hi there,
 
  Just a quick question that the wiki page (
  http://wiki.apache.org/solr/AnalyzersTokenizersTokenFilters) didn't seem
  to
  answer very well.
 
  Given an analyzer that has  zero or more Char Filter Factories, one
  Tokenizer Factory, and zero or more Token Filter Factories, which
 value(s)
  are indexed?
 
  Is every value that is produced from each char filter, tokenizer, and
  filter
  indexed?
  Or is the only the final value after completing the whole chain indexed?
 
  Cheers,
  Ben
 



Re: question on solr.ASCIIFoldingFilterFactory

2011-04-05 Thread Ben Davies
I can't remember where I read it, but I think MappingCharFilterFactory is
prefered.
There is an example in the example schema.

charFilter class=solr.MappingCharFilterFactory
mapping=mapping-ISOLatin1Accent.txt/

From this, I get:
org.apache.solr.analysis.MappingCharFilterFactory
{mapping=mapping-ISOLatin1Accent.txt}
|text|despues|



On Tue, Apr 5, 2011 at 5:06 PM, Nemani, Raj raj.nem...@turner.com wrote:

 All,

 I am using solr.ASCIIFoldingFilterFactory to perform accent insensitive
 search.  One of the words that got indexed as part my indexing process is
 después.  Having used the ASCIIFoldingFilterFactory,I expected that If I
 searched for word despues I should have the document containing the word
 después show up in the results but that was not the case.  Then I used the
 Analysis.jsp to analyze después and noticed that the
 ASCIIFoldingFilterFactory folded después as despue.



 If I repeat the above exercise for the word Imágenes, then Analysis.jsp
 tell me that the ASCIIFoldingFilterFactory folded Imágenes as imagen.
  But I can search for Imagenes and get the correct results.



 I am not familiar with Spanish but I found the above behavior confusing.
  Can anybody please explain the behavior described above?



 Thank a million in advance

 Raj






MoreLikeThis with document that has not been indexed

2011-03-30 Thread Ben Anhalt
Hello,

It is currently possible to use the MoreLikeThis handler to find documents
similar to a given document in the index.

Is there any way to feed the handler a new document in XML or JSON (as one
would do for adding to the index) and have it find similar documents without
indexing the target document?  I understand that it is possible to do a MLT
query using free text, but I want to utilize structured data.

Thanks,

Ben

-- 
Ben Anhalt
ben.anh...@gmail.com
Mi parolas Esperante.


Cache size

2011-02-08 Thread Mehdi Ben Haj Abbes
Hi folks,

Is there any way to know the size *in bytes* occupied by a cache (filter
cache, doc cache ...)? I don't find such information within the stats page.

Regards

-- 
Mehdi BEN HAJ ABBES


Re: xpath processing

2010-10-23 Thread Ben Boggess
 processor=FileListEntityProcessor fileName=.*xml recursive=true 

Shouldn't this be fileName=*.xml?

Ben

On Oct 22, 2010, at 10:52 PM, pghorp...@ucla.edu wrote:

 
 
 dataConfig
 dataSource name=myfilereader type=FileDataSource/
 document
 entity name=f rootEntity=false dataSource=null 
 processor=FileListEntityProcessor fileName=.*xml recursive=true 
 baseDir=C:\data\sample_records\mods\starr
 entity name=x dataSource=myfilereader processor=XPathEntityProcessor 
 url=${f.fileAbsolutePath} stream=false forEach=/mods 
 transformer=DateFormatTransformer,RegexTransformer,TemplateTransformer
 field column=id template=${f.file}/
 field column=collectionKey template=starr/
 field column=collectionName template=starr/
 field column=fileAbsolutePath template=${f.fileAbsolutePath}/
 field column=fileName template=${f.file}/
 field column=fileSize template=${f.fileSize}/
 field column=fileLastModified template=${f.fileLastModified}/
 field column=classification_keyword xpath=/mods/classification/
 field column=accessCondition_keyword xpath=/mods/accessCondition/
 field column=nameNamePart_s xpath=/mods/name/namepa...@type = 'date']
 /
 /entity
 /entity
 /document
 /dataConfig
 
 Quoting Ken Stanley doh...@gmail.com:
 
 Parinita,
 
 In its simplest form, what does your entity definition for DIH look like;
 also, what does one record from your xml look like? We need more information
 before we can really be of any help. :)
 
 - Ken
 
 It looked like something resembling white marble, which was
 probably what it was: something resembling white marble.
-- Douglas Adams, The Hitchhikers Guide to the Galaxy
 
 
 On Fri, Oct 22, 2010 at 8:00 PM, pghorp...@ucla.edu wrote:
 
 Quoting pghorp...@ucla.edu:
 Can someone help me please?
 
 
 I am trying to import mods xml data in solr using  the xml/http datasource
 
 This does not work with XPathEntityProcessor of the data import handler
 xpath=/mods/name/namepa...@type = 'date']
 
 I actually have 143 records with type attribute as 'date' for element
 namePart.
 
 Thank you
 Parinita
 
 
 
 
 
 
 


Multiple indexes inside a single core

2010-10-20 Thread ben boggess
We are trying to convert a Lucene-based search solution to a
Solr/Lucene-based solution.  The problem we have is that we currently have
our data split into many indexes and Solr expects things to be in a single
index unless you're sharding.  In addition to this, our indexes wouldn't
work well using the distributed search functionality in Solr because the
documents are not evenly or randomly distributed.  We are currently using
Lucene's MultiSearcher to search over subsets of these indexes.

I know this has been brought up a number of times in previous posts and the
typical response is that the best thing to do is to convert everything into
a single index.  One of the major reasons for having the indexes split up
the way we do is because different types of data need to be indexed at
different intervals.  You may need one index to be updated every 20 minutes
and another is only updated every week.  If we move to a single index, then
we will constantly be warming and replacing searchers for the entire
dataset, and will essentially render the searcher caches useless.  If we
were able to have multiple indexes, they would each have a searcher and
updates would be isolated to a subset of the data.

The other problem is that we will likely need to shard this large single
index and there isn't a clean way to shard randomly and evenly across the of
the data.  We would, however like to shard a single data type.  If we could
use multiple indexes, we would likely be also sharding a small sub-set of
them.

Thanks in advance,

Ben


Re: Multiple indexes inside a single core

2010-10-20 Thread Ben Boggess
Thanks Erick.  The problem with multiple cores is that the documents are scored 
independently in each core.  I would like to be able to search across both 
cores and have the scores 'normalized' in a way that's similar to what Lucene's 
MultiSearcher would do.  As far a I understand, multiple cores would likely 
result in seriously skewed scores in my case since the documents are not 
distributed evenly or randomly.  I could have one core/index with 20 million 
docs and another with 200.

I've poked around in the code and this feature doesn't seem to exist.  I would 
be happy with finding a decent place to try to add it.  I'm not sure if there 
is a clean place for it.

Ben

On Oct 20, 2010, at 8:36 PM, Erick Erickson erickerick...@gmail.com wrote:

 It seems to me that multiple cores are along the lines you
 need, a single instance of Solr that can search across multiple
 sub-indexes that do not necessarily share schemas, and are
 independently maintainable..
 
 This might be a good place to start: http://wiki.apache.org/solr/CoreAdmin
 
 HTH
 Erick
 
 On Wed, Oct 20, 2010 at 3:23 PM, ben boggess ben.bogg...@gmail.com wrote:
 
 We are trying to convert a Lucene-based search solution to a
 Solr/Lucene-based solution.  The problem we have is that we currently have
 our data split into many indexes and Solr expects things to be in a single
 index unless you're sharding.  In addition to this, our indexes wouldn't
 work well using the distributed search functionality in Solr because the
 documents are not evenly or randomly distributed.  We are currently using
 Lucene's MultiSearcher to search over subsets of these indexes.
 
 I know this has been brought up a number of times in previous posts and the
 typical response is that the best thing to do is to convert everything into
 a single index.  One of the major reasons for having the indexes split up
 the way we do is because different types of data need to be indexed at
 different intervals.  You may need one index to be updated every 20 minutes
 and another is only updated every week.  If we move to a single index, then
 we will constantly be warming and replacing searchers for the entire
 dataset, and will essentially render the searcher caches useless.  If we
 were able to have multiple indexes, they would each have a searcher and
 updates would be isolated to a subset of the data.
 
 The other problem is that we will likely need to shard this large single
 index and there isn't a clean way to shard randomly and evenly across the
 of
 the data.  We would, however like to shard a single data type.  If we could
 use multiple indexes, we would likely be also sharding a small sub-set of
 them.
 
 Thanks in advance,
 
 Ben
 


Re: How to delete a SOLR document if that particular data doesnt exist in DB?

2010-10-20 Thread ben boggess
 Now my question is.. Is there a way I can use preImportDeleteQuery to
 delete
 the documents from SOLR for which the data doesnt exist in back end db? I
 dont have anything called delete status in DB, instead I need to get all
 the
 UID's from SOLR document and compare it with all the UID's in back end and
 delete the data from SOLR document for the UID's which is not present in
 DB.

I've done something like this with raw Lucene and I'm not sure how or if you
could do it with Solr as I'm relatively new to it.

We stored a timestamp for when we started to import and stored an update
timestamp field for every document added to the index.  After the data
import, we did a delete by query that matched all documents with a timestamp
older than when we started.  The assumption being that if we didn't update
the timestamp during the load, then the record must have been deleted from
the database.

Hope this helps.

Ben

On Wed, Oct 20, 2010 at 8:05 PM, Erick Erickson erickerick...@gmail.comwrote:

 We are indexing multiple data by data types hence cant delete the index
 and
 do a complete re-indexing each week also we want to delete the orphan solr
 documents (for which the data is not present in back end DB) on a daily
 basis.

 Can you make delete by query work? Something like delete all Solr docs of
 a certain type and do a full re-index of just that type?

 I have no idea whether this is practical or not

 But your solution also works. There's really no way Solr #can# know about
 deleted database records, especially since the uniqueKey field is
 completely
 arbitrarily defined.

 Best
 Erick

 On Wed, Oct 20, 2010 at 10:51 AM, bbarani bbar...@gmail.com wrote:

 
  Hi,
 
  I have a very common question but couldnt find any post related to my
  question in this forum,
 
  I am currently initiating a full import each week but the data that have
  been deleted in the source is not update in my document as I am using
  clean=false.
 
  We are indexing multiple data by data types hence cant delete the index
 and
  do a complete re-indexing each week also we want to delete the orphan
 solr
  documents (for which the data is not present in back end DB) on a daily
  basis.
 
  Now my question is.. Is there a way I can use preImportDeleteQuery to
  delete
  the documents from SOLR for which the data doesnt exist in back end db? I
  dont have anything called delete status in DB, instead I need to get all
  the
  UID's from SOLR document and compare it with all the UID's in back end
 and
  delete the data from SOLR document for the UID's which is not present in
  DB.
 
  Any suggestion / ideas would be of great help.
 
  Note: Currently I have developed a simple program which will fetch the
  UID's
  from SOLR document and then connect to backend DB to check the orphan
 UID's
  and delete the documents from SOLR index corresponding to orphan UID's. I
  just dont want to re-invent the wheel if this feature is already present
 in
  SOLR as I need to do more testing in terms of performance / scalability
 for
  my program..
 
  Thanks,
  Barani
 
 
  --
  View this message in context:
 
 http://lucene.472066.n3.nabble.com/How-to-delete-a-SOLR-document-if-that-particular-data-doesnt-exist-in-DB-tp1739222p1739222.html
  Sent from the Solr - User mailing list archive at Nabble.com.
 



possible bug in zookeeper / solrCloud ?

2010-09-14 Thread Yatir Ben Shlomo
Hi I am using solrCloud which uses an ensemble of 3 zookeeper instances.

I am performing survivability  tests:
Taking one of the zookeeper instances down I would expect the client to use a 
different zookeeper server instance.

But as you can see in the below logs attached
Depending on which instance I choose to take down (in my case,  the last one in 
the list of zookeeper servers)
the client is constantly insisting on the same zookeeper server (Attempting 
connection to server zook3/192.168.252.78:2181)
and not switching to a different one
Any one has an idea on this ?

Solr cloud currently is using  zookeeper-3.2.2.jar
Is this a know bug that was fixed in later versions ?( 3.3.1)

Thanks in advance,
Yatir


Logs:

Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:20 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
INFO: Attempting connection to server zook3/192.168.252.78:2181
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Exception closing session 0x32b105244a20001 to 
sun.nio.ch.selectionkeyi...@3ca58cbf
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category info
INFO: Attempting connection to server zook3/192.168.252.78:2181
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Exception closing session 0x32b105244a2 to 
sun.nio.ch.selectionkeyi...@3960f81b
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.$$YJP$$checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.checkConnect(SocketChannelImpl.java)
at 
sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:574)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):933)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown input
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownInput(SocketChannelImpl.java:638)
at sun.nio.ch.SocketAdaptor.shutdownInput(SocketAdaptor.java:360)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):999)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)
Sep 14, 2010 9:02:22 AM org.apache.log4j.Category warn
WARNING: Ignoring exception during shutdown output
java.nio.channels.ClosedChannelException
at 
sun.nio.ch.SocketChannelImpl.shutdownOutput(SocketChannelImpl.java:649)
at sun.nio.ch.SocketAdaptor.shutdownOutput(SocketAdaptor.java:368)
at 
org.apache.zookeeper.ClientCnxn$SendThread.cleanup(zookeeper:ClientCnxn.java):1004)
at 
org.apache.zookeeper.ClientCnxn$SendThread.run(zookeeper:ClientCnxn.java):970)



solrCloud zookeepr related excpetions

2010-08-25 Thread Yatir Ben Shlomo
Hi I am running a zookeeper ensemble of 3 zookeeper instances
and established a solrCloud to work with it (2 masters , 2 slaves)
on each master machine I have 2 shards (4 shards in total)
on one of the masters I keep noticing ZooKeeper related exceptions which I 
can't understand:
One appears to be  TIME OUT in (ClientCnxn.java):906
And the other is java.lang.IllegalArgumentException: Path cannot be null 
(PathUtils.java:45)

Here are my logs (I set the log level to FINE on zookeeper package)

 Anyone can identify the issue?



FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null 
serverPath:null finished:false header:: -8,101  replyHeader:: -8,-1,0  
request:: 
30064776552,v{'/collections},v{},v{'/collections/ENPwl/shards/ENPWL1,'/collections/ENPwl/shards/ENPWL4,'/collections/ENPwl/shards/ENPWL2,'/collections,'/collections/ENPwl/shards/ENPWL3,'/collections/ENPwlMaster/shards/ENPWLMaster_3,'/collections/ENPwlMaster/shards/ENPWLMaster_4,'/live_nodes,'/collections/ENPwlMaster/shards/ENPWLMaster_1,'/collections/ENPwlMaster/shards/ENPWLMaster_2}
  response:: null
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug
FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null 
serverPath:null finished:false header:: 540,8  replyHeader:: 540,-1,0  
request:: '/collections,F  response:: v{'ENPwl,'ENPwlMaster}
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error
SEVERE: Error while calling watcher
java.lang.IllegalArgumentException: Path cannot be null
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
at 
org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200)
at 
org.apache.solr.common.cloud.ZkStateReader$5.process(ZkStateReader.java:315)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425)
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWL3 in collection:ENPwl
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWL4 in collection:ENPwl
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWL1 in collection:ENPwl
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM org.apache.solr.cloud.ZkController$2 process
INFO: Updating live nodes:org.apache.solr.common.cloud.solrzkcli...@55308275
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Updating live nodes from ZooKeeper...
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category debug
FINE: Reading reply sessionid:0x12a97312613010b, packet:: clientPath:null 
serverPath:null finished:false header:: 541,8  replyHeader:: 541,-1,0  
request:: '/live_nodes,F  response:: 
v{'ob1078.nydc1.outbrain.com:8983_solr2,'ob1078.nydc1.outbrain.com:8983_solr1,'ob1061.nydc1.outbrain.com:8983_solr2,'ob1062.nydc1.outbrain.com:8983_solr1,'ob1062.nydc1.outbrain.com:8983_solr2,'ob1061.nydc1.outbrain.com:8983_solr1,'ob1077.nydc1.outbrain.com:8983_solr2,'ob1077.nydc1.outbrain.com:8983_solr1}
Aug 25, 2010 5:18:19 AM org.apache.log4j.Category error
SEVERE: Error while calling watcher
java.lang.IllegalArgumentException: Path cannot be null
at org.apache.zookeeper.common.PathUtils.validatePath(PathUtils.java:45)
at 
org.apache.zookeeper.ZooKeeper.getChildren(zookeeper:ZooKeeper.java):1196)
at 
org.apache.solr.common.cloud.SolrZkClient.getChildren(SolrZkClient.java:200)
at org.apache.solr.cloud.ZkController$2.process(ZkController.java:321)
at 
org.apache.zookeeper.ClientCnxn$EventThread.run(zookeeper:ClientCnxn.java):425)
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ConnectionManager process
INFO: Watcher org.apache.solr.common.cloud.connectionmana...@339bb448 
name:ZooKeeperConnection Watcher:zook1:2181,zook2:2181,zook3:2181 got event 
WatchedEvent: Server state change. New state: Disconnected path:null type:None
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader$4 process
INFO: Detected a shard change under ShardId:ENPWLMaster_1 in 
collection:ENPwlMaster
Aug 25, 2010 5:18:19 AM org.apache.solr.common.cloud.ZkStateReader 
updateCloudState
INFO: Cloud state update for ZooKeeper already scheduled
Aug 25, 2010 5:18:19 AM 

question: havnig multiple solrCloud configuration on the same machine

2010-08-15 Thread Yatir Ben Shlomo
Hi!
I am using solrCloud with tomcat5.5
in my setup every lanugage has an its own index and its own solr filters so it 
needs a seprated solr configuration files.

in solrCLoud examples posted here : http://wiki.apache.org/solr/SolrCloud
I noticed that bootstrap_confdir is a given as global -D parameter
but I need to be able to supply it per Core
I tried doing this in solr.xml but failed

solr.xml
core name='coreES' instanceDir='coreES/'
  property name='dataDir' value='/Data/Solr/coreES' /
 property name=bootstrap_confdir value=/home/tomcat/solr/coreES/conf/
/core

all my cores are usign the same zoo keeper configuration according to the 
-Dbootstrap_confdi=...

does anyone know how I can specify the bootstrap_confdir on a per-core basis?
thanks

Yatir Ben Shlomo
Outbrain Engineering
yat...@outbrain.commailto:yat...@outbrain.com
tel: +972-73-223912
fax: +972-9-8350055
www.outbrain.comhttp://www.outbrain.com/



question: solrCloud with multiple cores on each machine

2010-07-27 Thread Yatir Ben Shlomo
Hi
 I am using solrCloud.
Suppose I have a total 4 machines dedicated for solr.
I want to have 2 machines as replication (salves) and 2 masters
But I want to work with 8 logical cores rather 2.
i.e. each master (and each slave) will have 4 cores on it.
the reason is that I can optimize the cores one at a time so the IO intensity 
at any given moment will be low and will not degrade the online performance

Is there a way to configure my solr.xml so that when I am doing a distributed 
search (distrib=true) it will know to query all 8 cores ?

Thanks
Yatir


Re: How real-time are Soir/Lucene queries?

2010-05-21 Thread Ben Eliott

You may wish to look at  Lucandra: http://github.com/tjake/Lucandra

On 21 May 2010, at 06:12, Walter Underwood wrote:

Solr is a very good engine, but it is not real-time. You can turn  
off the caches and reduce the delays, but it is fundamentally not  
real-time.


I work at MarkLogic, and we have a real-time transactional search  
engine (and respository). If you are curious, contact me directly.


I do like Solr for lots of applications -- I chose it when I was at  
Netflix.


wunder

On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:


Hello Soir,

Soir looks like an excellent API and its nice to have a tutorial  
that makes it easy to discover the basics of what Soir does, I'm  
impressed. I can see plenty of potential uses of Soir/Lucene and  
I'm interested now in just how real-time the queries made to an  
index can be?


For example, in my application I have time ordered data being  
processed by a paint method in real-time. Each piece of data is  
identified and its associated renderer is invoked. The Java2D  
renderer would then lookup any layout and style values it requires  
to render the current data it has received from the layout and  
style indexes. What I'm wondering is if this lookup which would be  
a Lucene search will be fast enough?


Would it be best to make Lucene queries for the relevant layout and  
style values required by the renderers ahead of rendering time and  
have the query results placed into the most performant collection  
(map/array) so renderer lookup would be as fast as possible? Or can  
Lucene handle many individual lookup queries fast enough so  
rendering is quick?


Best regards from Canada,

Thom











Re: How real-time are Soir/Lucene queries?

2010-05-21 Thread Ben Eliott
Further to earlier note re Lucandra.  I note that Cassandra, which  
Lucandra backs onto,  is 'eventually consistent',  so given your real- 
time requirements,  you may want to review this in the first instance,  
if Lucandra is of interest.


On 21 May 2010, at 06:12, Walter Underwood wrote:

Solr is a very good engine, but it is not real-time. You can turn  
off the caches and reduce the delays, but it is fundamentally not  
real-time.


I work at MarkLogic, and we have a real-time transactional search  
engine (and respository). If you are curious, contact me directly.


I do like Solr for lots of applications -- I chose it when I was at  
Netflix.


wunder

On May 20, 2010, at 7:22 PM, Thomas J. Buhr wrote:


Hello Soir,

Soir looks like an excellent API and its nice to have a tutorial  
that makes it easy to discover the basics of what Soir does, I'm  
impressed. I can see plenty of potential uses of Soir/Lucene and  
I'm interested now in just how real-time the queries made to an  
index can be?


For example, in my application I have time ordered data being  
processed by a paint method in real-time. Each piece of data is  
identified and its associated renderer is invoked. The Java2D  
renderer would then lookup any layout and style values it requires  
to render the current data it has received from the layout and  
style indexes. What I'm wondering is if this lookup which would be  
a Lucene search will be fast enough?


Would it be best to make Lucene queries for the relevant layout and  
style values required by the renderers ahead of rendering time and  
have the query results placed into the most performant collection  
(map/array) so renderer lookup would be as fast as possible? Or can  
Lucene handle many individual lookup queries fast enough so  
rendering is quick?


Best regards from Canada,

Thom











Re: Custom sort

2009-07-10 Thread Ben
It could be that you should be providing an implementation of 
SortComparatorSource
I have missed the earlier part of this thread, I assume you're trying to 
implement some form of custom search?


B

dontthinktwice wrote:


Marc Sturlese wrote:
  

I have been able to create my custom field. The problem is that I have
laoded in the solr core a couple of HashMapsid_doc,value_influence_sort
from a DB with values that will influence in the sort. My problem is that
I don't know how to let my custom sort have access to this HashMaps.
I am a bit confused now. I think that would be easy to reach my goal
using:

CustomSortComponent extends SearchComponent implements SolrCoreAware

This way, I would load the HashMaps in the inform method and would
create de custom sort using the HashMaps in the preprare method.

Don't know how to do that with the CustomField (similar to the
RandomField)... any advice?





Marc, did you get this working somehow? I'm looking at doing something
similar, and before I make a custom sort field (like RandomSortField) I
would be delighted to know that I can give it access to a the data structure
it will need to calculate the sort...

  




Re: DocSlice andNotSize

2009-07-03 Thread Ben
DocSet isn't an object it's an interface. The DocSlice class 
*implements* DocSet.
What you're saying about set operations not working for DocSlice but 
working for DocSet then doesn't make any sense... can you clarify?


The failure of these set operations to work as expected is confusing the 
hell out of me too!


Thanks
Ben


Yonik Seeley wrote:

On Thu, Jul 2, 2009 at 4:24 PM, Candide Kemmlercand...@palacehotel.org wrote:
  

I have a simple question rel the DocSlice class. I'm trying to use the (very
handy) set operations on DocSlices and I'm rather confused by the way it
behaves.

I have 2 DocSlices, atlDocs which, by looking at the debugger, holds a
docs array of ints of size 1; the second DocSlice is btlDocs, with a
docs array of ints of size 67. I know that atlDocs is a subset of btlDocs,
so the doing btlDocs.andNotSize(atlDocs) should really return 66.

But it's returning 10.



The short answer is that all of the set operations were only designed
for DocSets (as opposed to DocLists).
Yes, perhaps DocList should not have extended DocSet...

-Yonik
http://www.lucidimagination.com
  




Re: Excluding characters from a wildcard query - More Info - Is this difficult, or am I being ignored because it's too obvious to merit an answer?

2009-07-01 Thread Ben


Ben wrote:

The exception SOLR raises is :

org.apache.lucene.queryParser.ParseException: Cannot parse 
'vector:_*[^_]*_[^_]*_[^_]*': Encountered ] at line 1, column 12.

Was expecting one of:
   TO ...
   RANGEIN_QUOTED ...
   RANGEIN_GOOP ...
 
Ben wrote:
Passing in a RegularExpression like [^_]*_[^_]* (e.g. matching 
anything with an underscore in the string) using some code like :


...
parameters.add(fq, vector:[^_]*_[^_]*);
...

seems to cause problems for SOLR, I assume because of the [ or ^ 
character.


Can somebody please advise how to handle character exclusion in such 
searches?


Any help or pointers are much appreciated!

Thanks

Ben






Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben
Yes, I had done that... however, I'm beginning to see now that what I am 
doing is called a wildcard query which is going via Lucene's queryparser.
Lucene's query parser doesn't not support the regexp idea of character 
exclusion ... i.e. I'm not trying to match [ I'm trying to express 
Match as many characters as possible, which are not underscores with [^_]*


Perhaps I'm going about my whole problem in an ineffective way, but I'm 
not sure how I can sensibly describe what I'm doing without it becoming 
a long document.


The only other approach I can think of is to change what I'm indexing 
but I'm not sure how to achieve that.

I've tried explaining it once, and obviously failed, so I'll try again.

I'm given a string containing many vectors (where each dimension is 
separated by an underscore, and each vector is seperated by a comma) e.g.


A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3

I want my facet query to tell me if, within one of the vectors within 
that string, there is a match for dimensions I'm interested in. Of the 
four dimensions in this example, I may choose to fix an arbitrary number 
of them with values, and the rest with wildcards e.g. I might look for a 
facet containing Ox_*_*_* so one of the vectors in the string must have 
its first dimension matching Ox and I don't care about the rest.


***Is there a way to break down this string on the comma's so that I can 
apply a normal wildcard query and SOLR applies it to each 
individually?*** That would solve all my problems :

e.g.
The string is internally represented in lucene/solr as
A1_B1_C1_D1
A2_B2_C2_D2
A3_B3_C3_D3

where it tries to match the wildcard query on each in turn?

Thanks for you help, I'm deeply confused about this at the moment...

Ben


Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben
Is there a way in the Schema to specify that the comma should be used to 
split the values up? 
e.g. Can I specify my vector field as multivalue and also specify some 
sort of tokeniser to automatically split on commas?


Ben


Uwe Klosa wrote:

You should split the strings at the comma yourself and store the values in a
multivalued field? Then wildcard search like A1_* are not a problem. I don't
know so much about facets. But if they work on multivalued fields that
should be then no problem at all.

Uwe

2009/7/1 Ben b...@autonomic.net

  

Yes, I had done that... however, I'm beginning to see now that what I am
doing is called a wildcard query which is going via Lucene's queryparser.
Lucene's query parser doesn't not support the regexp idea of character
exclusion ... i.e. I'm not trying to match [ I'm trying to express Match
as many characters as possible, which are not underscores with [^_]*

Perhaps I'm going about my whole problem in an ineffective way, but I'm not
sure how I can sensibly describe what I'm doing without it becoming a long
document.

The only other approach I can think of is to change what I'm indexing but
I'm not sure how to achieve that.
I've tried explaining it once, and obviously failed, so I'll try again.

I'm given a string containing many vectors (where each dimension is
separated by an underscore, and each vector is seperated by a comma) e.g.

A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3

I want my facet query to tell me if, within one of the vectors within that
string, there is a match for dimensions I'm interested in. Of the four
dimensions in this example, I may choose to fix an arbitrary number of them
with values, and the rest with wildcards e.g. I might look for a facet
containing Ox_*_*_* so one of the vectors in the string must have its first
dimension matching Ox and I don't care about the rest.

***Is there a way to break down this string on the comma's so that I can
apply a normal wildcard query and SOLR applies it to each individually?***
That would solve all my problems :
e.g.
The string is internally represented in lucene/solr as
A1_B1_C1_D1
A2_B2_C2_D2
A3_B3_C3_D3

where it tries to match the wildcard query on each in turn?

Thanks for you help, I'm deeply confused about this at the moment...

Ben




  




Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben

I'm not quite sure I understand exactly what you mean.
The string I'm processing could have many tens of thousands of values... 
I hope you aren't implying I'd need to split it into many tens of 
thousands of columns.


If you're saying what I think you're saying, you're saying that I should 
leave whitespaces between the individual parts of the string, pass in 
the string into a multiValued field and have SOLR internally treat 
each word as an individual entity? 


Thanks for your help with this...

Ben

Uwe Klosa wrote:

To get the desired efffect I described you have to do the split before you
send the document to solr. I'm not aware of an analyzer that can split one
field value into several field values. The analyzers and tokenizers do
create tokens from field values in many different ways.

As I see it you have to do some preprocessing yourself.

Uwe

2009/7/1 Ben b...@autonomic.net

  

Is there a way in the Schema to specify that the comma should be used to
split the values up? e.g. Can I specify my vector field as multivalue and
also specify some sort of tokeniser to automatically split on commas?

Ben



Uwe Klosa wrote:



You should split the strings at the comma yourself and store the values in
a
multivalued field? Then wildcard search like A1_* are not a problem. I
don't
know so much about facets. But if they work on multivalued fields that
should be then no problem at all.

Uwe

2009/7/1 Ben b...@autonomic.net



  

Yes, I had done that... however, I'm beginning to see now that what I am
doing is called a wildcard query which is going via Lucene's
queryparser.
Lucene's query parser doesn't not support the regexp idea of character
exclusion ... i.e. I'm not trying to match [ I'm trying to express
Match
as many characters as possible, which are not underscores with [^_]*

Perhaps I'm going about my whole problem in an ineffective way, but I'm
not
sure how I can sensibly describe what I'm doing without it becoming a
long
document.

The only other approach I can think of is to change what I'm indexing but
I'm not sure how to achieve that.
I've tried explaining it once, and obviously failed, so I'll try again.

I'm given a string containing many vectors (where each dimension is
separated by an underscore, and each vector is seperated by a comma) e.g.

A1_B1_C1_D1,A2_B2_C2_D2,A3_B3_C3_D3

I want my facet query to tell me if, within one of the vectors within
that
string, there is a match for dimensions I'm interested in. Of the four
dimensions in this example, I may choose to fix an arbitrary number of
them
with values, and the rest with wildcards e.g. I might look for a facet
containing Ox_*_*_* so one of the vectors in the string must have its
first
dimension matching Ox and I don't care about the rest.

***Is there a way to break down this string on the comma's so that I can
apply a normal wildcard query and SOLR applies it to each
individually?***
That would solve all my problems :
e.g.
The string is internally represented in lucene/solr as
A1_B1_C1_D1
A2_B2_C2_D2
A3_B3_C3_D3

where it tries to match the wildcard query on each in turn?

Thanks for you help, I'm deeply confused about this at the moment...

Ben






  



  




Re: Excluding characters from a wildcard query

2009-07-01 Thread Ben
my brain was switched off.  I'm using SOLRJ, which means I'll need to 
specify multiple :


addMultipleFields(solrDoc, vector, vectorvalue, 1.0f);

for each value to be added to the multiValuedField.

Then, with luck, the simple wildcard query will be executed over each 
individual value when looking for matches, meaning the simple query 
syntax can made adequate to do what's needed.


Many thanks Uwe.

B

Uwe Klosa wrote:

2009/7/1 Ben b...@autonomic.net

  

I'm not quite sure I understand exactly what you mean.
The string I'm processing could have many tens of thousands of values... I
hope you aren't implying I'd need to split it into many tens of thousands of
columns.




No, that is not what I meant. It will be one field (column) with tens of
thousands of values.


  

If you're saying what I think you're saying, you're saying that I should
leave whitespaces between the individual parts of the string, pass in the
string into a multiValued field and have SOLR internally treat each word
as an individual entity?
Thanks for your help with this...




I said nothing about whitespaces. I don't know how you update your solr
documents. Are you using XML or Solrj?

Uwe

  




Building Solr index with Lucene

2009-07-01 Thread Ben Bangert
For performance reasons, we're attempting to build the index used with  
Solr, directly in Lucene. It works for the most part fine, but I'm  
having issue when it comes to stemming. I'm guessing this is due to a  
mismatch in how Lucene is stemming, with how Solr stems during its  
queries or something.


Has anyone built their Solr index using Lucene, and how did you handle  
stemmed fields in Lucene so that Solr worked properly with them?


Cheers,
Ben


Excluding characters from a wildcard query

2009-06-30 Thread Ben
Passing in a RegularExpression like [^_]*_[^_]* (e.g. matching 
anything with an underscore in the string) using some code like :


...
parameters.add(fq, vector:[^_]*_[^_]*);
...

seems to cause problems for SOLR, I assume because of the [ or ^ character.

Can somebody please advise how to handle character exclusion in such 
searches?


Any help or pointers are much appreciated!

Thanks

Ben


Re: Excluding characters from a wildcard query - More Info

2009-06-30 Thread Ben

The exception SOLR raises is :

org.apache.lucene.queryParser.ParseException: Cannot parse 
'vector:_*[^_]*_[^_]*_[^_]*': Encountered ] at line 1, column 12.

Was expecting one of:
   TO ...
   RANGEIN_QUOTED ...
   RANGEIN_GOOP ...
  


Ben wrote:
Passing in a RegularExpression like [^_]*_[^_]* (e.g. matching 
anything with an underscore in the string) using some code like :


...
parameters.add(fq, vector:[^_]*_[^_]*);
...

seems to cause problems for SOLR, I assume because of the [ or ^ 
character.


Can somebody please advise how to handle character exclusion in such 
searches?


Any help or pointers are much appreciated!

Thanks

Ben




Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-06-29 Thread Ben

Hello,

I've been using SOLR for a while now, but am stuck for information on 
two issues :


1) Is it possible to exclude characters in a SOLR facet wildcard query?
e.g.
[^,]* to match any character except an ,  ?

2) Can one setup the facet wildcard query to return the exact sub 
strings it matched of the queried facet, rather than the whole string?


I hope somebody can help :)

Thanks,

Ben



Re: Excluding Characters and SubStrings in a Faceted Wildcard Query

2009-06-29 Thread Ben

Hi Erik,

I'm not sure exactly how much context you need here, so I'll try to keep 
it short and expand as needed.


The column I am faceting contains a comma deliniated set of vectors. 
Each vector is made up of {Make,Year,Model} e.g. 
_ford_1996_focus,mercedes_1996_clk,ford_2000_focus


I have a custom request handler, where if I want to find all the cars 
from 1996 I pass in a facet query for the Year (1996) which is 
transformed to a wildcard facet query :


_*_1996_*

In otherwords, it'll match any records whose vector column contains a 
string, which somewhere has a car from 1996.


Why not put the Make, Year and Model in separate columns and do a facet 
query of multiple columns?... because once we've selected 1996, we 
should (in the above example) then be offering ford and mercedes as 
further facet choices, and nothing more. If the parts were in their own 
columns, there would be no way to tie the Makes and Models to specific 
years, for example.


At anyrate, the wildcard search returns the entire match 
(_ford_1996_focus,mercedes_1996_clk,ford_2000_focus). I then have to do 
another RegExp over it to extract only the two parts (the first ford and 
mercedes) that were from 1996. This isn't using SOLR's cache very 
effectively.


It would be excellent if SOLR could break up that comma separated list 
into three different parts, and run the RegExp over each , returning 
only those which match. Is that what you're implying with Analysis? If 
that were the case, I'd not need to worry about character exclusion.


Sorry if that's a bit fuzzy... it's hard trying to explain enough to be 
useful, but not too much that it turns into an essay!!!


Thanks,
Ben

The solution I'm using is to form a vector

Erik Hatcher wrote:

Ben,

Could you post an example of the type of data you're dealing with and 
how you want it handled?   I suspect there is a way to accomplish what 
you want using an analyzed field, or by preprocessing the data you're 
indexing.


Erik

On Jun 29, 2009, at 9:29 AM, Ben wrote:


Hello,

I've been using SOLR for a while now, but am stuck for information on 
two issues :


1) Is it possible to exclude characters in a SOLR facet wildcard query?
e.g.
[^,]* to match any character except an ,  ?

2) Can one setup the facet wildcard query to return the exact sub 
strings it matched of the queried facet, rather than the whole string?


I hope somebody can help :)

Thanks,

Ben






Sending Mlt POST request

2009-05-25 Thread Ohad Ben Porat
Hello,



I wish to send an Mlt request to Solr and filter the result by a list of values 
to specific field.   The problem is sometimes the list can include 
thousands of values and it's impossible to send such GET request.

Sending this request as POST didn't work well... Is POST supported by mlt? If 
not, is there suppose to be added in one of the next versions? Or is there a 
different solution maybe?



I will appreciate any help and advice,

Thanks,

Ohad.



dismax query not working with 1.4

2009-03-26 Thread Ben Lavender
Hello,

I'm using the March 18th 1.4 nightly, and I can't get a dismax query
to return results.  The standard and partitioned query types return
data fine.  I'm using jetty, and the problem occurs with the default
solrconfig.xml as well as the one I am using, which is the Drupal
module, beta 6.  The problem occurs in the admin interface for solr,
though, not just in the end application.

And...that's it?  I don't know what else to say or offer other than
dismax doesn't work, and I'm not sure where else to go to
troubleshoot.  Any ideas?

Ben


Re: dismax query not working with 1.4

2009-03-26 Thread Ben Lavender
I do not have a qf set; this is the query generated by the admin interface:
dismax:
select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=dismaxwt=standardexplainOther=hl.fl=

standard:
select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

dismax has no results, standard has 30.

I don't see a requirement that qf be defined on
http://wiki.apache.org/solr/DisMaxRequestHandler; am I missing
something?

The query responses are the same with both the application-specific
and default solrconfig.xml's.  The application definition for dismax
is:
  requestHandler name=dismax class=solr.SearchHandler
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
/lst
  /requestHandler

And the one from my nightly is:
  requestHandler name=dismax class=solr.SearchHandler 
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
 float name=tie0.01/float
 str name=qf
text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4
 /str
 str name=pf
text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9
 /str
 str name=bf
ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3
 /str
 str name=fl
id,name,price,score
 /str
 str name=mm
2lt;-1 5lt;-2 6lt;90%
 /str
 int name=ps100/int
 str name=q.alt*:*/str
 !-- example highlighter config, enable per-query with hl=true --
 str name=hl.fltext features name/str
 !-- for this field, we want no fragmenting, just highlighting --
 str name=f.name.hl.fragsize0/str
 !-- instructs Solr to return the field itself if no query terms are
  found --
 str name=f.name.hl.alternateFieldname/str
 str name=f.text.hl.fragmenterregex/str !-- defined below --
/lst
  /requestHandler

So there's no particular mention of any fields from schema.xml in
dismax, but the standard works without that.

Thanks for the responses,
Ben

On Thu, Mar 26, 2009 at 2:11 PM, Matt Mitchell goodie...@gmail.com wrote:
 Do you have qf set? Just last week I had a problem where no results were
 coming back, and it turned out that my qf param was empty.

 Matt

 On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender blaven...@gmail.com wrote:

 Hello,

 I'm using the March 18th 1.4 nightly, and I can't get a dismax query
 to return results.  The standard and partitioned query types return
 data fine.  I'm using jetty, and the problem occurs with the default
 solrconfig.xml as well as the one I am using, which is the Drupal
 module, beta 6.  The problem occurs in the admin interface for solr,
 though, not just in the end application.

 And...that's it?  I don't know what else to say or offer other than
 dismax doesn't work, and I'm not sure where else to go to
 troubleshoot.  Any ideas?

 Ben




Re: dismax query not working with 1.4

2009-03-26 Thread Ben Lavender
Did the XML in that message come through okay?  Gmail seems to be
eating it on my end.

Anyway, while the default config has those fields, it also fails with
the application config, which has:
  requestHandler name=dismax class=solr.SearchHandler
lst name=defaults
 str name=defTypedismax/str
 str name=echoParamsexplicit/str
/lst
  /requestHandler

Since this essentially the same as standard, I assumed it would work
without any qf.  I manually added a qf to the query with the
application solrconfig and got a result.  Off to debug the application
side!

Thank you very much for the help!

Ben


On Thu, Mar 26, 2009 at 3:08 PM, Otis Gospodnetic
otis_gospodne...@yahoo.com wrote:

 Standard searches your default field (specified in schema.xml).
 DisMax searches fields you specify in DisMax config.
 Yours has:
  text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4

 But there are not your real fields.  Change that to your real fields in qf, 
 pf and other parts of DisMax config and things should start working.

 Otis
 --
 Sematext -- http://sematext.com/ -- Lucene - Solr - Nutch



 - Original Message 
 From: Ben Lavender blaven...@gmail.com
 To: solr-user@lucene.apache.org
 Sent: Thursday, March 26, 2009 4:02:58 PM
 Subject: Re: dismax query not working with 1.4

 I do not have a qf set; this is the query generated by the admin interface:
 dismax:
 select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=dismaxwt=standardexplainOther=hl.fl=

 standard:
 select?indent=onversion=2.2q=teststart=0rows=10fl=*%2Cscoreqt=standardwt=standardexplainOther=hl.fl=

 dismax has no results, standard has 30.

 I don't see a requirement that qf be defined on
 http://wiki.apache.org/solr/DisMaxRequestHandler; am I missing
 something?

 The query responses are the same with both the application-specific
 and default solrconfig.xml's.  The application definition for dismax
 is:


     dismax
     explicit



 And the one from my nightly is:


     dismax
     explicit
     0.01

         text^0.5 features^1.0 name^1.2 sku^1.5 id^10.0 manu^1.1 cat^1.4


         text^0.2 features^1.1 name^1.5 manu^1.4 manu_exact^1.9


         ord(popularity)^0.5 recip(rord(price),1,1000,1000)^0.3


         id,name,price,score


         2-1 5-2 690%

     100
     *:*

     text features name

     0

     name
     regex



 So there's no particular mention of any fields from schema.xml in
 dismax, but the standard works without that.

 Thanks for the responses,
 Ben

 On Thu, Mar 26, 2009 at 2:11 PM, Matt Mitchell wrote:
  Do you have qf set? Just last week I had a problem where no results were
  coming back, and it turned out that my qf param was empty.
 
  Matt
 
  On Thu, Mar 26, 2009 at 2:30 PM, Ben Lavender wrote:
 
  Hello,
 
  I'm using the March 18th 1.4 nightly, and I can't get a dismax query
  to return results.  The standard and partitioned query types return
  data fine.  I'm using jetty, and the problem occurs with the default
  solrconfig.xml as well as the one I am using, which is the Drupal
  module, beta 6.  The problem occurs in the admin interface for solr,
  though, not just in the end application.
 
  And...that's it?  I don't know what else to say or offer other than
  dismax doesn't work, and I'm not sure where else to go to
  troubleshoot.  Any ideas?
 
  Ben
 
 




field range (min and max term)

2009-02-02 Thread Ben Incani
Hi Solr users,

Is there a method of retrieving a field range i.e. the min and max
values of that fields term enum.

For example I would like to know the first and last date entry of N
documents.

Regards,

-Ben


RE: *Very* slow Commit after upgrading to solr 1.3

2008-10-08 Thread Ben Shlomo, Yatir
So other than me doing trial  error, do you have any guidance on how to
configure the merge factor (and ramBufferSizeMB ? ).
any formula that supplies the optimal value ?
Thanks,
Yatir

-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Tuesday, October 07, 2008 1:10 PM
To: solr-user@lucene.apache.org
Subject: Re: *Very* slow Commit after upgrading to solr 1.3

On Tue, Oct 7, 2008 at 6:32 AM, Ben Shlomo, Yatir
[EMAIL PROTECTED] wrote:
 The problem is solved, see below.
 Since the performance is so sensitive to configuration - do you have a
 tip on how to determine the optimal configuration for
 mergeFactor, ramBufferSizeMB and other properties ?

The issue might have been your high merge factor coupled with changes
in how Lucene closes an index.  To prevent possible corruption on a
crash, Lucene now does an fsync on the index files before it writes
the new segment descriptor that references those files.  A high merge
factor means more segments, hence more segment files to sync on a
close.

-Yonik


 My original problem occurred even on a fresh rebuild of the index with
 solr 1.3
 To solve it I used the entire IndexWriter section settings from the
solr
 1.3 example file
 This had a dramatic impact:
 I indexed 20 GB of data (52M docs)
 The total indexing time was 13 hours
 The index size was 30 GB
 The total commit time was less than 2 minutes

 Tomcat Log for reference

 Oct 5, 2008 9:43:24 PM org.apache.solr.update.DirectUpdateHandler2
 commit
 INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher init
 INFO: Opening [EMAIL PROTECTED] main
 Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
 commit
 INFO: end_commit_flush
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main


filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,

warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
 0.00,cumulative_inserts=0,cumulative_evictions=0}
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming result for [EMAIL PROTECTED] main


filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,

warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
 0.00,cumulative_inserts=0,cumulative_evictions=0}
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main


queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si

ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
 atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming result for [EMAIL PROTECTED] main


queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si

ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
 atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main


documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=

0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
 o=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
 INFO: autowarming result for [EMAIL PROTECTED] main


documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=

0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
 o=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore registerSearcher
 INFO: [] Registered new searcher [EMAIL PROTECTED] main
 Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher close
 INFO: Closing [EMAIL PROTECTED] main


filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,

warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
 0.00,cumulative_inserts=0,cumulative_evictions=0}


queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si

ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
 atio=0.00,cumulative_inserts=0,cumulative_evictions=0}


documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=

0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
 o=0.00,cumulative_inserts=0,cumulative_evictions=0}
 Oct 5, 2008 9:43:43 PM
 org.apache.solr.update.processor.LogUpdateProcessor finish
 INFO: {commit=} 0 18406
 Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore execute
 INFO: [] webapp=/dss1 path=/update params={} status=0 QTime=18406
 Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
 commit
 INFO: start commit(optimize=true

RE: *Very* slow Commit after upgrading to solr 1.3

2008-10-07 Thread Ben Shlomo, Yatir
Thanks Yonik,

The problem is solved, see below.
Since the performance is so sensitive to configuration - do you have a
tip on how to determine the optimal configuration for 
mergeFactor, ramBufferSizeMB and other properties ?

My original problem occurred even on a fresh rebuild of the index with
solr 1.3
To solve it I used the entire IndexWriter section settings from the solr
1.3 example file
This had a dramatic impact:
I indexed 20 GB of data (52M docs)
The total indexing time was 13 hours
The index size was 30 GB
The total commit time was less than 2 minutes

Tomcat Log for reference

Oct 5, 2008 9:43:24 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening [EMAIL PROTECTED] main
Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: end_commit_flush
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
atio=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming [EMAIL PROTECTED] main from [EMAIL PROTECTED] main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
o=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher warm
INFO: autowarming result for [EMAIL PROTECTED] main

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
o=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore registerSearcher
INFO: [] Registered new searcher [EMAIL PROTECTED] main
Oct 5, 2008 9:43:43 PM org.apache.solr.search.SolrIndexSearcher close
INFO: Closing [EMAIL PROTECTED] main

filterCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=0,
warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitratio=
0.00,cumulative_inserts=0,cumulative_evictions=0}

queryResultCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,si
ze=0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitr
atio=0.00,cumulative_inserts=0,cumulative_evictions=0}

documentCache{lookups=0,hits=0,hitratio=0.00,inserts=0,evictions=0,size=
0,warmupTime=0,cumulative_lookups=0,cumulative_hits=0,cumulative_hitrati
o=0.00,cumulative_inserts=0,cumulative_evictions=0}
Oct 5, 2008 9:43:43 PM
org.apache.solr.update.processor.LogUpdateProcessor finish
INFO: {commit=} 0 18406
Oct 5, 2008 9:43:43 PM org.apache.solr.core.SolrCore execute
INFO: [] webapp=/dss1 path=/update params={} status=0 QTime=18406 
Oct 5, 2008 9:43:43 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: start commit(optimize=true,waitFlush=false,waitSearcher=true)
Oct 5, 2008 9:45:07 PM org.apache.solr.search.SolrIndexSearcher init
INFO: Opening [EMAIL PROTECTED] main
Oct 5, 2008 9:45:07 PM org.apache.solr.update.DirectUpdateHandler2
commit
INFO: end_commit_flush


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Yonik
Seeley
Sent: Saturday, October 04, 2008 6:07 PM
To: solr-user@lucene.apache.org
Subject: Re: *Very* slow Commit after upgrading to solr 1.3

Ben, see also

http://www.nabble.com/Commit-in-solr-1.3-can-take-up-to-5-minutes-td1980
2781.html#a19802781

What type of physical drive is this and what interface is used (SATA,
etc)?
What is the filesystem (NTFS)?

Did you add to an existing index from an older version of Solr, or
start from scratch?

If you add a single document to the index and commit

*Very* slow Commit after upgrading to solr 1.3

2008-09-29 Thread Ben Shlomo, Yatir
Hi!

 

I am running on widows 64 bit ...
I have upgraded to solr 1.3 in order to use the distributed search.

I haven't changed the solrConfig and the schema xml files during the
upgrade.

I am indexing ~ 350K documents (each one is about 0.5 KB in size)

The indexing takes a reasonable amount of time (350 seconds)

See tomcat log:

INFO: {add=[8x-wbTscWftuu1sVWpdnGw==, VOu1eSv0obBl1xkj2jGjIA==,
YkOm-nKPrTVVVyeCZM4-4A==, rvaq_TyYsqt3aBc0KKDVbQ==,
9NdzWXsErbF_5btyT1JUjw==, ...(398728 more)]} 0 349875

 

But when I commit it takes more than an hour ! (5000 seconds!, the
optimize after the commit took 14 seconds)

INFO: start commit(optimize=false,waitFlush=false,waitSearcher=true)

 

p.s. its not a machine problem I moved to another machine and the same
thing happened


I noticed something very strange during the time I wait for the commit:

While the solr index is 210MB in size

In the windows task manager I noticed that the java process is making a
HUGE amounts of IO reads:

It reads more than 350 GB ! (- which takes a lot of time.)

The process is constantly taking 25% of the cpu resources.

All my autowarmCount in Solrconfig  file do not exceed 256...

 

Any more ideas to check?

Thanks.

 

 

 

Here is part of my solrConfig file:

- file:///C:\dss1\SolrHome\conf\solrconfig.xml##   - indexDefaults

- !--  Values here affect all index writers and act as a default unless
overridden. 

  -- 

  useCompoundFilefalse/useCompoundFile 

  mergeFactor1000/mergeFactor 

  maxBufferedDocs1000/maxBufferedDocs 

  maxMergeDocs2147483647/maxMergeDocs 

  maxFieldLength1/maxFieldLength 

  writeLockTimeout1000/writeLockTimeout 

  commitLockTimeout1/commitLockTimeout 

  /indexDefaults

- mainIndex

- !--  options specific to the main on-disk lucene index 

  -- 

  useCompoundFilefalse/useCompoundFile 

  mergeFactor1000/mergeFactor 

  maxBufferedDocs1000/maxBufferedDocs 

  maxMergeDocs2147483647/maxMergeDocs 

  maxFieldLength1/maxFieldLength 

- !--  If true, unlock any held write or commit locks on startup. 

 This defeats the locking mechanism that allows multiple

 processes to safely access a lucene index, and should be

 used with care. 

  -- 

  unlockOnStartuptrue/unlockOnStartup 

  /mainIndex

 

 

 

 

 

Yatir Ben-shlomo | eBay, Inc. | Classification Track, Shopping.com
(Israel) | w: +972-9-892-1373 |  email: [EMAIL PROTECTED] |

 



help required: how to design a large scale solr system

2008-09-24 Thread Ben Shlomo, Yatir
Hi!

I am already using solr 1.2 and happy with it.

In a new project with very tight dead line (10 development days from
today) I need to setup a more ambitious system in terms of scale
Here is the spec:

 

* I need to index about 60,000,000
documents 

* Each document is has 11 textual fields to be indexed  stored
and 4 more fields to be stored only 

* Most fields are short (2-14 characters) however 2 indexed
fields can be up to 1KB and another stored field is up to 1KB 

* On average every document is about 0.5 KB to be stored and
0.4KB to be indexed 

* The SLA for data freshness is a full nightly re-index ( I
cannot obtain an incremental update/delete lists of the modified
documents) 

* The SLA for query time is 5 seconds 

* the number of expected queries is 2-3 queries per second 

* the queries are simple a combination of Boolean operation and
name searches (no fancy fuzzy searches and levinstien distances, no
faceting, etc) 

* I have a 64 bit Dell 2950 4-cpu machine  (2 dual cores ) with
RAID 10, 200 GB HD space, and 8GB RAM memory 

* The documents are not given to me explicitly - I am given a
raw-documents in RAM - one by one, from which I create my document in
RAM.
and then I can either http-post is to index it directly or append it to
a tsv file for later indexing 

* Each document has a unique ID

 

I have a few directions I am thinking about

 

The simple approach

* Have one solr instance that will index
the entire document set (from files). I am afraid this will take too
much time

 

Direction 1

* Create TSV files from all the
documents - this will take around 3-4 hours 

* Have all the documents partitioned
into several subsets (how many should I choose? ) 

* Have multiple solr instances on the
same machine 

* Let each solr instance concurrently
index the appropriate subset 

* At the end merge all the indices using
the IndexMergeTool - (how much time will it take ?)

 

Direction 2

* Like  the previous but instead of
using the IndexMergeTool , use distributed search with shards (upgrading
to solr 1.3)

 

Direction 3,4

* Like previous directions only avoid
using TSV files at all and directly index the documents from RAM

Questions:

* Which direction do you recommend in order to meet the SLAs in
the fastest way? 

* Since I have RAID on the machine can I gain performance by
using multiple solr instances on the same machine or only multiple
machines will help me 

* What's the minimal number of machines I should require (I
might get more weaker machines) 

* How many concurrent indexers are recommended? 

* Do you agree that the bottle neck is the indexing time?

Any help is appreciated 

Thanks in advance

yatir

 



RE: help required: how to design a large scale solr system

2008-09-24 Thread Ben Shlomo, Yatir
Thanks Mark!.
Do you have any comment regarding the performance differences between
indexing TSV files as opposed to directly indexing each document via
http post?

-Original Message-
From: Mark Miller [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, September 24, 2008 2:12 PM
To: solr-user@lucene.apache.org
Subject: Re: help required: how to design a large scale solr system


 From my limited experience:

I think you might have a bit of trouble getting 60 mil docs on a single 
machine. Cached queries will probably still be *very* fast, but non 
cached queries are going to be very slow in many cases. Is that 5 
seconds for all queries? You will never meet that on first run queries 
with 60mil docs on that machine. The light query load might make things 
workable...but your near the limits of a single machine (4 core or not) 
with 60 mil. You want to use a very good stopword list...common term 
queries will be killer. The docs being so small will be your only 
possible savior if you go the one machine route - that and cached hits. 
You don't have enough ram to get as much of the filesystem into RAM as 
youd like for 60 mil docs either.

I think you might try two machines with 30, 3 with 20, or 4 with 15. The

more you spread, even with slower machines, the faster your likely to 
index, which as you say, will take a long time for 60 mil docs (start 
today g). Multiple machines will help the indexing speed the most for 
sure - its still going to take a long time.

I don't think you will get much advantage using more than one solr 
install on a single machine - if you do, that should be addressed in the

code, even with RAID.

So I say, spread if you can. Faster indexing, faster search, easy to 
expand later. Distributed search is so easy with solr 1.3, you wont 
regret it. I think there is a bug to be addressed if your needing this 
in a week though - in my experience, with distributed search, for every 
million docs on a machine beyond the first, you lose a doc in a search 
across all machines (ie 1 mil on machine 1, 1 million on machine 2, a 
*:* search will be missing 1 doc. 10 mil each on 3 machines, a *:* 
search will be missing 30. Not a big deal, but could be a concern for 
some with picky, look at everything customers.

- Mark

Ben Shlomo, Yatir wrote:
 Hi!

 I am already using solr 1.2 and happy with it.

 In a new project with very tight dead line (10 development days from
 today) I need to setup a more ambitious system in terms of scale
 Here is the spec:

  

 * I need to index about 60,000,000
 documents 

 * Each document is has 11 textual fields to be indexed 
stored
 and 4 more fields to be stored only 

 * Most fields are short (2-14 characters) however 2 indexed
 fields can be up to 1KB and another stored field is up to 1KB 

 * On average every document is about 0.5 KB to be stored and
 0.4KB to be indexed 

 * The SLA for data freshness is a full nightly re-index ( I
 cannot obtain an incremental update/delete lists of the modified
 documents) 

 * The SLA for query time is 5 seconds 

 * the number of expected queries is 2-3 queries per second 

 * the queries are simple a combination of Boolean operation
and
 name searches (no fancy fuzzy searches and levinstien distances, no
 faceting, etc) 

 * I have a 64 bit Dell 2950 4-cpu machine  (2 dual cores )
with
 RAID 10, 200 GB HD space, and 8GB RAM memory 

 * The documents are not given to me explicitly - I am given a
 raw-documents in RAM - one by one, from which I create my document in
 RAM.
 and then I can either http-post is to index it directly or append it
to
 a tsv file for later indexing 

 * Each document has a unique ID

  

 I have a few directions I am thinking about

  

 The simple approach

 * Have one solr instance that will
index
 the entire document set (from files). I am afraid this will take too
 much time

  

 Direction 1

 * Create TSV files from all the
 documents - this will take around 3-4 hours 

 * Have all the documents partitioned
 into several subsets (how many should I choose? ) 

 * Have multiple solr instances on the
 same machine 

 * Let each solr instance concurrently
 index the appropriate subset 

 * At the end merge all the indices
using
 the IndexMergeTool - (how much time will it take ?)

  

 Direction 2

 * Like  the previous but instead of
 using the IndexMergeTool , use distributed search with shards
(upgrading
 to solr 1.3)

  

 Direction 3,4

 * Like previous directions only avoid
 using TSV files at all and directly index the documents from RAM

 Questions:

 * Which direction do you recommend in order to meet

Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Hi solr users,
I need to change the query format for solr a little bit. How can I
accomplish this. I don't wan to modify the underlying lucene query
specification but just the way I query the index through the the GET http
method in solr.
Thanks a lot for your help.

Ben


Re: Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Shalin, thanks a lot for answering that fast.

Use Case:
I'm migrating from a proprietary index server (XYZ)  to Solr. All my
applications and my customer's applications relay on the query specification
of XYZ. It would be hard to modify all those apps to use the Solr Query
Syntax (although, it would be ideal, Sorl query is a lot superior than that
of XYZ, but impractical).



On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar 
[EMAIL PROTECTED] wrote:

 Hi Ben,

 It would be nice if you can tell us your use-case so that we can be
 more helpful.

 Why does the normal query syntax not work well for you? What are you
 trying to accomplish? Maybe there is an easier way.

 On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez [EMAIL PROTECTED] wrote:
  Hi solr users,
   I need to change the query format for solr a little bit. How can I
   accomplish this. I don't wan to modify the underlying lucene query
   specification but just the way I query the index through the the GET
 http
   method in solr.
   Thanks a lot for your help.
 
   Ben
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Hi Shalin, thanks a lot for answering that fast.

Use Case:
I'm migrating from a proprietary index server (XYZ)  to Solr. All my
applications and my customer's applications relay on the query specification
of XYZ. It would be hard to modify all those apps to use the Solr Query
Syntax (although, it would be ideal, Sorl query is a lot superior than that
of XYZ).

Basically I need  to replace : with = ; + with / and = with :  in the query
syntax.

Thank you.

On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar 
[EMAIL PROTECTED] wrote:

 Hi Ben,

 It would be nice if you can tell us your use-case so that we can be
 more helpful.

 Why does the normal query syntax not work well for you? What are you
 trying to accomplish? Maybe there is an easier way.

 On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez [EMAIL PROTECTED] wrote:
  Hi solr users,
   I need to change the query format for solr a little bit. How can I
   accomplish this. I don't wan to modify the underlying lucene query
   specification but just the way I query the index through the the GET
 http
   method in solr.
   Thanks a lot for your help.
 
   Ben
 



 --
 Regards,
 Shalin Shekhar Mangar.



Re: Changing Solr Query Syntax

2008-03-18 Thread Ben Sanchez
Shalin, Thanks a lot. I'll do that.

On Tue, Mar 18, 2008 at 11:13 AM, Shalin Shekhar Mangar 
[EMAIL PROTECTED] wrote:

 Hi Ben,

 If I had to do this, I would start by adding a custom
 javax.servlet.Filter into Solr. It should work fine since all you're
 doing is replacing characters in the q parameter for requests coming
 into /select handler. It's a bit hackish but that's exactly what
 you're trying to do :)

 Don't know if there's an alternate/easier way.

 On Tue, Mar 18, 2008 at 9:30 PM, Ben Sanchez [EMAIL PROTECTED] wrote:
  Hi Shalin, thanks a lot for answering that fast.
 
 
   Use Case:
   I'm migrating from a proprietary index server (XYZ)  to Solr. All my
   applications and my customer's applications relay on the query
 specification
   of XYZ. It would be hard to modify all those apps to use the Solr Query
   Syntax (although, it would be ideal, Sorl query is a lot superior than
 that
   of XYZ).
 
   Basically I need  to replace : with = ; + with / and = with :  in the
 query
   syntax.
 
   Thank you.
 
 
   On Tue, Mar 18, 2008 at 9:50 AM, Shalin Shekhar Mangar 
   [EMAIL PROTECTED] wrote:
 
 
 
   Hi Ben,
   
It would be nice if you can tell us your use-case so that we can be
more helpful.
   
Why does the normal query syntax not work well for you? What are you
trying to accomplish? Maybe there is an easier way.
   
On Tue, Mar 18, 2008 at 8:17 PM, Ben Sanchez [EMAIL PROTECTED]
 wrote:
 Hi solr users,
  I need to change the query format for solr a little bit. How can I
  accomplish this. I don't wan to modify the underlying lucene query
  specification but just the way I query the index through the the
 GET
http
  method in solr.
  Thanks a lot for your help.

  Ben

   
   
   
--
Regards,
Shalin Shekhar Mangar.
   
 



 --
 Regards,
 Shalin Shekhar Mangar.



solr web admin

2007-12-19 Thread Ben Incani
why does the web admin append core=null to all the requests?

e.g. admin/get-file.jsp?core=nullfile=schema.xml


retrieve lucene doc id

2007-12-16 Thread Ben Incani
how do I retrieve the lucene doc id in a query?

-Ben


RE: lowercase text/strings to be used in list box

2007-10-21 Thread Ben Incani
sorry - this should have been posted on the Lucene user list.

...the solution is to use the lucene PerFieldAnalyzerWrapper and add the
field with the KeywordAnalyzer then pass the PerFieldAnalyzerWrapper to
the QueryParser.

-Ben

 -Original Message-
 From: Ben Incani [mailto:[EMAIL PROTECTED] 
 Sent: Friday, 19 October 2007 5:52 PM
 To: solr-user@lucene.apache.org
 Subject: lowercase text/strings to be used in list box
 
 I have a field which will only contain several values (that 
 include spaces).
 
 I want to display a list box with all possible values by 
 browsing the lucene terms.
 
 I have setup a field in the schema.xml file.
 
 fieldtype name=text_lc class=solr.TextField
   analyzer
 tokenizer class=solr.KeywordTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 
 I also tried;
 
 fieldtype name=string_lc class=solr.StrField
   analyzer
 tokenizer class=solr.LowerCaseTokenizerFactory/
 filter class=solr.LowerCaseFilterFactory/
   /analyzer
 /fieldtype
 
 
 This allows me to browse all the values no problem, but when 
 it comes to search the documents I have to use the lucene 
 org.apache.lucene.analysis.KeywordAnalyzer, when I would 
 rather use the 
 org.apache.lucene.analysis.standard.StandardAnalyzer and the 
 power of the default query parser to perform a phrase query 
 such as my_field:(the
 value) or my_field:the value, which don't work?
 
 So is there a way to prevent tokenisation of a field using 
 the StandardAnalyzer, without implementing your own TokenizerFactory?
 
 Regards
 
 Ben
 


lowercase text/strings to be used in list box

2007-10-19 Thread Ben Incani
I have a field which will only contain several values (that include
spaces).

I want to display a list box with all possible values by browsing the
lucene terms.

I have setup a field in the schema.xml file.

fieldtype name=text_lc class=solr.TextField
  analyzer
tokenizer class=solr.KeywordTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype

I also tried;

fieldtype name=string_lc class=solr.StrField
  analyzer
tokenizer class=solr.LowerCaseTokenizerFactory/
filter class=solr.LowerCaseFilterFactory/
  /analyzer
/fieldtype


This allows me to browse all the values no problem, but when it comes to
search the documents I have to use the lucene
org.apache.lucene.analysis.KeywordAnalyzer, when I would rather use the
org.apache.lucene.analysis.standard.StandardAnalyzer and the power of
the default query parser to perform a phrase query such as my_field:(the
value) or my_field:the value, which don't work?

So is there a way to prevent tokenisation of a field using the
StandardAnalyzer, without implementing your own TokenizerFactory?

Regards

Ben


RE: solr not finding all results

2007-10-15 Thread Ben Shlomo, Yatir
Did you try to add a backslash to escape the - in Geckoplp4-M
(Geckoplp4\-M)


-Original Message-
From: Kevin Lewandowski [mailto:[EMAIL PROTECTED] 
Sent: Friday, October 12, 2007 9:40 PM
To: solr-user@lucene.apache.org
Subject: solr not finding all results

I've found an odd situation where solr is not returning all of the
documents that I think it should. A search for Geckoplp4-M returns 3
documents but I know that there are at least 100 documents with that
string.

Here is an example query for that phrase and the result set:
http://localhost:9020/solr/select/?q=Geckoplp4-Mversion=2.2start=0row
s=10indent=onfl=comments,id
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
 int name=status0/int
 int name=QTime0/int
 lst name=params
  str name=rows10/str
  str name=start0/str
  str name=indenton/str
  str name=flcomments,id/str
  str name=qGeckoplp4-M/str
  str name=version2.2/str
 /lst
/lst
result name=response numFound=3 start=0
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2816500/str
 /doc
 doc
  str name=commentstoptrax recordings. Same tracks.
Geckoplp4-M/str
  str name=idm2816544/str
 /doc
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2815903/str
 /doc
/result
/response

Now here's an example of a search for two documents that I know have
that string, but were not returned in the previous search:
http://localhost:9020/solr/select/?q=id%3Am2816615+OR+id%3Am2816611vers
ion=2.2start=0rows=10indent=onfl=id,comments
?xml version=1.0 encoding=UTF-8?
response
lst name=responseHeader
 int name=status0/int
 int name=QTime1/int
 lst name=params
  str name=rows10/str
  str name=start0/str
  str name=indenton/str
  str name=flid,comments/str
  str name=qid:m2816615 OR id:m2816611/str
  str name=version2.2/str
 /lst
/lst
result name=response numFound=2 start=0
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2816611/str
 /doc
 doc
  str name=commentsGeckoplp4-M/str
  str name=idm2816615/str
 /doc
/result
/response

Here is the definition for the comments field:
field name=comments type=text indexed=true stored=true/

And here is the definition for a text field:
fieldtype name=text class=solr.TextField
positionIncrementGap=100
  analyzer type=index
  tokenizer class=solr.WhitespaceTokenizerFactory/
  !-- in this example, we will only use synonyms at query time
  filter class=solr.SynonymFilterFactory
synonyms=index_synonyms.txt ignoreCase=true expand=false/
  --
  !--filter class=solr.StopFilterFactory
ignoreCase=true/--
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=1
catenateNumbers=1 catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  !--filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/--
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
  filter class=solr.ISOLatin1AccentFilterFactory /
  /analyzer
  analyzer type=query
  tokenizer class=solr.WhitespaceTokenizerFactory/
  filter class=solr.SynonymFilterFactory
synonyms=synonyms.txt ignoreCase=true expand=true/
  !--filter class=solr.StopFilterFactory
ignoreCase=true/--
  filter class=solr.WordDelimiterFilterFactory
generateWordParts=1 generateNumberParts=1 catenateWords=0
catenateNumbers=0 catenateAll=0/
  filter class=solr.LowerCaseFilterFactory/
  !--filter class=solr.EnglishPorterFilterFactory
protected=protwords.txt/--
  filter class=solr.RemoveDuplicatesTokenFilterFactory/
  filter class=solr.ISOLatin1AccentFilterFactory /
  /analyzer
/fieldtype

Any ideas? Am I doing something wrong?

thanks,
Kevin


I can't delete, why?

2007-09-25 Thread Ben Shlomo, Yatir
Hi!
I know I can delete multiple docs with the following:
deletequerymediaId:(6720 OR 6721 OR  )/query/delete

My question is can I do something like this?
deletequerylanguageId:123 AND manufacturer:456 /query/delete
(It does not work for me and I didn't forget to commit)


How can I do it ? with copy field ?
deletequerylanguageIdmanufacturer:123456/query/delete
Thanks
yatir


problem with quering solr after indexing UTF-8 encoded CSV files

2007-08-20 Thread Ben Shlomo, Yatir
Hi!

 

I have utf-8 encoded data inside a csv file (actually it’s a tab separated file 
- attached)

I can index it with no apparent errors

I did not forget to set this in my tomcat configuration

 

 

Server ...
 Service ...
   Connector ... URIEncoding=UTF-8/

 

When I query  a document using the UTF-8 text I get zero matches: 

 

  ?xml version=1.0 encoding=UTF-8 ? 

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8version=2.2start=0rows=10indent=on##
  response

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8version=2.2start=0rows=10indent=on##
  lst name=responseHeader

  int name=status0/int 

  int name=QTime0/int 

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=%D7%99%D7%AA%D7%99%D7%A8version=2.2start=0rows=10indent=on##
  lst name=params

  str name=indenton/str 

  str name=start0/str 

ststr name=qיתיר/str // Note that - I can see the correct UTF-8 
text in it (hebrew characters)

  str name=rows10/str 

  str name=version2.2/str 

  /lst

  /lst

  result name=response numFound=0 start=0 / 

  /response

 

 

When I observe this text in the response by querinig for *:*

I notice that the text does not appear as desired: יתיר instead of יתיר

Do you have any ideas?

Thanks…

 

Here is the response :

 

  ?xml version=1.0 encoding=UTF-8 ? 

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on##
  response

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on##
  lst name=responseHeader

  int name=status0/int 

  int name=QTime0/int 

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on##
  lst name=params

  str name=indenton/str 

  str name=start0/str 

  str name=q*:*/str 

  str name=rows10/str 

  str name=version2.2/str 

  /lst

  /lst

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on##
  result name=response numFound=1 start=0

- 
http://localhost:8080/apache-solr-1.2.1-dev/select/?q=*%3A*version=2.2start=0rows=10indent=on##
  doc

  str name=country1/str 

  str name=descdesc is a very good camera/str 

  str name=dispnamedisplay is יתיר ABC res123 /str 

  str name=form1/str 

  str name=lang1/str 

  str name=manuABC/str 

  str name=model res123 /str 

  str name=pnC123/str 

  str name=productid123456/str 

  str name=upc72900010123/str 

  /doc

  /result

  /response

 

 

yatir



question: how to divide the indexing into sperate domains

2007-08-09 Thread Ben Shlomo, Yatir
Hi!

say I have 300 csv files that I need to index. 

Each one holds millions of lines (each line is a few fields separated by
commas)

Each csv file represents a different domain of data (e,g, file1 is
computers, file2 is flowers, etc)

There is no indication of the domain ID in the data inside the csv file

 

When I search I would like to specify the id of a specific domain

And I want solr to search only in this domain - to save time and reduce
the number of matches

I need to specify during indexing - the domain id of the csv file being
indexed

How do I do it ?

 

 

Thanks 

 

 

 

p.s. 

I wish I could index like this:

curl
http://localhost:8080/solr/update/csv?stream.file=test.csvfieldnames=fi
eld1,field2f.domain.value=98765
http://localhost:8080/solr/update/csv?stream.file=test.csvfieldnames=f
ield1,field2f.domain.value=98765  (where 98765 is the domain id for
ths specific csv file)



separate log files

2007-01-15 Thread Ben Incani
Hi Solr users,

I'm running multiple instances of Solr, which all using the same war
file to load from.

Below is an example of the servlet context file used for each
application.

Context path=/app1-solr docBase=/var/usr/solr/solr-1.0.war
debug=0 crossContext=true 
Environment name=solr/home type=java.lang.String
value=/var/local/app1 override=true /
/Context

Hence each application is using the same
WEB-INF/classes/logging.properties file to configure logging.

I would like to each instance to log to separate log files such as;
app1-solr.-mm-dd.log
app2-solr.-mm-dd.log
...

Is there an easy way to append the context path to
org.apache.juli.FileHandler.prefix
E.g. 
org.apache.juli.FileHandler.prefix = ${catalina.context}-solr.
 
Or would this require a code change?

Regards

-Ben