Re: Decompound German Words

2012-05-06 Thread Martin Frank
Dear Satish,

did you found a decompounding dictionary for german?

Best Regards
Martin

--
View this message in context: 
http://lucene.472066.n3.nabble.com/Decompound-German-Words-tp3708194p3966013.html
Sent from the Solr - User mailing list archive at Nabble.com.


Solritas in production

2012-05-06 Thread András Bártházi
Hi,

We're currently evaluating Solr as a Sphinx replacement. Our site has
1.000.000+ pageviews a day, it's a real estate search engine. The
development is almost done, and it seems to be working fine, however some
of my colleagues come with the idea that we're using it wrong. We're using
it as a service from PHP/Symfony.

They think we should use Solritas as a frontend, so site visitors will
directly use it, so no PHP will be involved, so it will be use much less
infrastructure. One of them said that even mobile.de using it that way (I
have found no clue about it at all).

Do you think is it a good idea?

Do you know services using Solritas as a frontend on a public site?

My personal opinion is that using Solritas in production is a very bad idea
for us, but have not so much experience with Solr yet, and Solritas
documentation is far from a detailed, up-to-date one, so don't really know
what is it really usable for.

Thanks,
  Andras


Re: question about NRT(soft commit) and Transaction Log in trunk

2012-05-06 Thread Michael McCandless
This is a good question...

I don't know much about how Solr's transaction log works, but, peeking
in the code, I do see it fsync'ing (look in TransactionLog.java, in
the finish method), but only if the SyncLevel is FSYNC.

If the default is really flush, I don't see how the transaction log
helps on recovery...?

Should we change the default ot FSYNC?

Mike McCandless

http://blog.mikemccandless.com


On Sat, Apr 28, 2012 at 7:11 AM, Li Li fancye...@gmail.com wrote:
 hi
   I checked out the trunk and played with its new soft commit
 feature. it's cool. But I've got a few questions about it.
   By reading some introductory articles and wiki, and hasted code
 reading, my understand of it's implementation is:
   For normal commit(hard commit), we should flush all into disk and
 commit it. flush is not very time consuming because of
 os level cache. the most time consuming one is sync in commit process.
   Soft commit just flush postings and pending deletions into disk
 and generating new segments. Then solr can use a
 new searcher to read the latest indexes and warm up and then register itself.
   if there is no hard commit and the jvm crashes, then new data may lose.
   if my understanding is correct, then why we need transaction log?
   I found in DirectUpdateHandler2, every time a command is executed,
 TransactionLog will record a line in log. But the default
 sync level in RunUpdateProcessorFactory is flush, which means it will
 not sync the log file. does this make sense?
   in database implementation, we usually write log and modify data
 in memory because log is smaller than real data. if crashes.
 we can redo the unfinished log and make data correct. will Solr
 leverage this log like this? if it is, why it's not synced?


Partition Question

2012-05-06 Thread Yuval Dotan
Hi All
We have an index of ~2,000,000,000 Documents and the query and facet times
are too slow for us.
Before using the shards solution for improving performance, we thought
about using the multicore feature (our goal is to maximize performance for
a single machine).
Most of our queries will be limited by time, hence we want to partition the
data by date/time.
We want to partition the data because the index size is too big and doesn't
fit into memory (80 Gb's).

1. Is multi core the best way to implement my requirement?
2. I noticed there are some LOAD / UNLOAD actions on a core - should i use
these action when managing my cores? if so how can i LOAD a core that i
have unloaded
for example:
I have 7 partitions / cores - one for each day of the week
In most cases I will search for documents only on the last day core.
Once every 1 queries I need documents from all cores.
Question: Do I need to unload all of the old cores and then load them on
demand (when i see i need data from these cores)?
3. If the question to the last answer is no, how do i ensure that only
cores that are loaded into memory are the ones I want?

Thanks
Yuval


Re: Invalid version (expected 2, but 60) on CentOS in production please Help!!!

2012-05-06 Thread Ravi Solr
Thank you very much for responding Mr.Erickson. You may be right on
old version index, I will reindex. However we have a 2
separate/disjoint master-slave setup...only one query node/slave has
this issue. if it was really incompatible indexes why isnt the other
query server also throwing errors? that's what is throwing my
debugging thought process off.

Thanks

Ravi Kiran Bhaskar
Principal Software Engineer
Washington Post Digital
1150 15th Street NW, Washington, DC 20071

On Sat, May 5, 2012 at 12:53 PM, Erick Erickson erickerick...@gmail.com wrote:
 The first thing I'd check is if, in the log, there is a replication happening
 immediately prior to the error. I confess I'm not entirely up on the
 version thing, but is it possible you're replicating an index that
 is built with some other version of Solr?

 That would at least explain your statement that it runs OK, but then
 fails sometime later.

 Best
 Erick

 On Fri, May 4, 2012 at 1:50 PM, Ravi Solr ravis...@gmail.com wrote:
 Hello,
         We Recently we migrated our SOLR 3.6 server OS from Solaris
 to CentOS and from then on we started seeing Invalid version
 (expected 2, but 60) errors on one of the query servers (oddly one
 other query server seems fine). If we restart the server having issue
 everything will be alright, but the next day in the morning again we
 get the same exception. I made sure that all the client applications
 are using SOLR 3.6 version.

 The Glassfish on which all the applications  and SOLR are deployed use
 Java  1.6.0_29. The only difference I could see

 1. The process indexing to the server having issues is using java1.6.0_31
 2. The process indexing to the server that DOES NOT have issues is
 using java1.6.0_29

 Could the Java minor version being greater than the SOLR instance be
 the cause of this issue  ???

 Can anybody please help me debug this a bit more ? what else can I
 look at to understand the underlying problem. The stack trace is given
 below


 [#|2012-05-04T09:58:43.985-0400|SEVERE|sun-appserver2.1.1|xxx...|_ThreadID=32;_ThreadName=httpSSLWorkerThread-9001-7;_RequestID=a19f92cc-2a8c-47e8-b159-a20330f14af5;
 org.apache.solr.client.solrj.SolrServerException: Error executing query
        at 
 org.apache.solr.client.solrj.request.QueryRequest.process(QueryRequest.java:95)
        at org.apache.solr.client.solrj.SolrServer.query(SolrServer.java:311)
        at 
 com.wpost.ipad.feeds.FeedController.findLinksetNewsBySection(FeedController.java:743)
        at 
 com.wpost.ipad.feeds.FeedController.findNewsBySection(FeedController.java:347)
        at sun.reflect.GeneratedMethodAccessor282.invoke(Unknown Source)
        at 
 sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
        at java.lang.reflect.Method.invoke(Method.java:597)
        at 
 org.springframework.web.bind.annotation.support.HandlerMethodInvoker.invokeHandlerMethod(HandlerMethodInvoker.java:175)
        at 
 org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.invokeHandlerMethod(AnnotationMethodHandlerAdapter.java:421)
        at 
 org.springframework.web.servlet.mvc.annotation.AnnotationMethodHandlerAdapter.handle(AnnotationMethodHandlerAdapter.java:409)
        at 
 org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:774)
        at 
 org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:719)
        at 
 org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:644)
        at 
 org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:549)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:734)
        at javax.servlet.http.HttpServlet.service(HttpServlet.java:847)
        at 
 org.apache.catalina.core.ApplicationFilterChain.servletService(ApplicationFilterChain.java:427)
        at 
 org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:315)
        at 
 org.apache.catalina.core.StandardContextValve.invokeInternal(StandardContextValve.java:287)
        at 
 org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:218)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
        at com.sun.enterprise.web.WebPipeline.invoke(WebPipeline.java:94)
        at 
 com.sun.enterprise.web.PESessionLockingStandardPipeline.invoke(PESessionLockingStandardPipeline.java:98)
        at 
 org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:222)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:648)
        at 
 org.apache.catalina.core.StandardPipeline.doInvoke(StandardPipeline.java:593)
        at 
 org.apache.catalina.core.StandardPipeline.invoke(StandardPipeline.java:587)
        at 
 

Re: Solritas in production

2012-05-06 Thread Jan Høydahl
Hi,

Solritas (Velocity Response Writer) is NOT intended for production use. The 
simple reason, apart from that it is not production grade quality, is that it 
requires direct access to the Solr instance, as it is simply a response writer. 
You MUST use a separate front end layer above Solr and never expose Solr 
directly to the world. So you should feel totally comfortable continuing to use 
Solr over HTTP from PHP!

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 6. mai 2012, at 14:02, András Bártházi wrote:

 Hi,
 
 We're currently evaluating Solr as a Sphinx replacement. Our site has
 1.000.000+ pageviews a day, it's a real estate search engine. The
 development is almost done, and it seems to be working fine, however some
 of my colleagues come with the idea that we're using it wrong. We're using
 it as a service from PHP/Symfony.
 
 They think we should use Solritas as a frontend, so site visitors will
 directly use it, so no PHP will be involved, so it will be use much less
 infrastructure. One of them said that even mobile.de using it that way (I
 have found no clue about it at all).
 
 Do you think is it a good idea?
 
 Do you know services using Solritas as a frontend on a public site?
 
 My personal opinion is that using Solritas in production is a very bad idea
 for us, but have not so much experience with Solr yet, and Solritas
 documentation is far from a detailed, up-to-date one, so don't really know
 what is it really usable for.
 
 Thanks,
  Andras



Re: Partition Question

2012-05-06 Thread Jan Høydahl
Hi,

First you need to investigate WHY faceting and querying is too slow.
What exactly do you mean by slow? Can you please tell us more about your setup?
* How large documents and how many fields?
* What kind of queries? How many hits? How many facets? Have you studies 
debugQuery=true output?
* Do you use filter queries (fq) extensively?
* What data do you facet on? Many unique values per field? Text or ranges? What 
facet.method?
* What kind of hardware? RAM/CPU
* How have you configured your JVM? How much memory? GC?

As you see, you will have to provide a lot more information on your use case 
and setup in order for us to judge correct action to take. You might need to 
adjust your config, or to optimize your queries or caches, slim your schema, 
buy some more RAM, or an SSD :)

Normally, going multi core on one box will not necessarily help in itself, as 
there is overhead in sharding multi cores as well. However, it COULD be a 
solution since you say that most of the time you only need to consider 1/7 of 
your data. I would perhaps consider one hot core for last 24h, and one 
archive core for older data. You could then tune these differently regarding 
caches etc.

Can you get back with some more details?

--
Jan Høydahl, search solution architect
Cominvent AS - www.cominvent.com
Solr Training - www.solrtraining.com

On 6. mai 2012, at 17:07, Yuval Dotan wrote:

 Hi All
 We have an index of ~2,000,000,000 Documents and the query and facet times
 are too slow for us.
 Before using the shards solution for improving performance, we thought
 about using the multicore feature (our goal is to maximize performance for
 a single machine).
 Most of our queries will be limited by time, hence we want to partition the
 data by date/time.
 We want to partition the data because the index size is too big and doesn't
 fit into memory (80 Gb's).
 
 1. Is multi core the best way to implement my requirement?
 2. I noticed there are some LOAD / UNLOAD actions on a core - should i use
 these action when managing my cores? if so how can i LOAD a core that i
 have unloaded
 for example:
 I have 7 partitions / cores - one for each day of the week
 In most cases I will search for documents only on the last day core.
 Once every 1 queries I need documents from all cores.
 Question: Do I need to unload all of the old cores and then load them on
 demand (when i see i need data from these cores)?
 3. If the question to the last answer is no, how do i ensure that only
 cores that are loaded into memory are the ones I want?
 
 Thanks
 Yuval



Re: Solritas in production

2012-05-06 Thread Marcelo Carvalho Fernandes
Hi Jan,

I would answer András exactly the oposite :-) I would like to understand
and ask you something.

Would you see any problem if he had a Apache Httpd configured as reverse
proxy (no PHP in it) in front of Solr just to restrict user access to only
the Solritas's URL? This way Solr would not be directly exposed and he
would not need to develop a PHP site/application.

Maybe a Varnish layer would be even better as he has 1.000.000+ pageviews a
day. Again, no PHP in this scenario.

What's your opinion about both solutions?

Thanks in advance,


Marcelo Carvalho Fernandes
+55 21 8272-7970
+55 21 2205-2786


On Sun, May 6, 2012 at 7:42 PM, Jan Høydahl jan@cominvent.com wrote:

 Hi,

 Solritas (Velocity Response Writer) is NOT intended for production use.
 The simple reason, apart from that it is not production grade quality, is
 that it requires direct access to the Solr instance, as it is simply a
 response writer. You MUST use a separate front end layer above Solr and
 never expose Solr directly to the world. So you should feel totally
 comfortable continuing to use Solr over HTTP from PHP!

 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com

 On 6. mai 2012, at 14:02, András Bártházi wrote:

  Hi,
 
  We're currently evaluating Solr as a Sphinx replacement. Our site has
  1.000.000+ pageviews a day, it's a real estate search engine. The
  development is almost done, and it seems to be working fine, however some
  of my colleagues come with the idea that we're using it wrong. We're
 using
  it as a service from PHP/Symfony.
 
  They think we should use Solritas as a frontend, so site visitors will
  directly use it, so no PHP will be involved, so it will be use much less
  infrastructure. One of them said that even mobile.de using it that way
 (I
  have found no clue about it at all).
 
  Do you think is it a good idea?
 
  Do you know services using Solritas as a frontend on a public site?
 
  My personal opinion is that using Solritas in production is a very bad
 idea
  for us, but have not so much experience with Solr yet, and Solritas
  documentation is far from a detailed, up-to-date one, so don't really
 know
  what is it really usable for.
 
  Thanks,
   Andras




Re: Solritas in production

2012-05-06 Thread Walter Underwood
Do not directly expose Solr to WWW traffic. It isn't designed for that.

For example, the admin pages have no access controls.

I can change my request parameters to request a million rows and put a huge 
load on your server. A few of those, and you are off the air.

I can fetch your config, then send a command to DIH to do a full import.

And so on.

wunder

On May 6, 2012, at 5:50 PM, Marcelo Carvalho Fernandes wrote:

 Hi Jan,
 
 I would answer András exactly the oposite :-) I would like to understand
 and ask you something.
 
 Would you see any problem if he had a Apache Httpd configured as reverse
 proxy (no PHP in it) in front of Solr just to restrict user access to only
 the Solritas's URL? This way Solr would not be directly exposed and he
 would not need to develop a PHP site/application.
 
 Maybe a Varnish layer would be even better as he has 1.000.000+ pageviews a
 day. Again, no PHP in this scenario.
 
 What's your opinion about both solutions?
 
 Thanks in advance,
 
 
 Marcelo Carvalho Fernandes
 +55 21 8272-7970
 +55 21 2205-2786
 
 
 On Sun, May 6, 2012 at 7:42 PM, Jan Høydahl jan@cominvent.com wrote:
 
 Hi,
 
 Solritas (Velocity Response Writer) is NOT intended for production use.
 The simple reason, apart from that it is not production grade quality, is
 that it requires direct access to the Solr instance, as it is simply a
 response writer. You MUST use a separate front end layer above Solr and
 never expose Solr directly to the world. So you should feel totally
 comfortable continuing to use Solr over HTTP from PHP!
 
 --
 Jan Høydahl, search solution architect
 Cominvent AS - www.cominvent.com
 Solr Training - www.solrtraining.com
 
 On 6. mai 2012, at 14:02, András Bártházi wrote:
 
 Hi,
 
 We're currently evaluating Solr as a Sphinx replacement. Our site has
 1.000.000+ pageviews a day, it's a real estate search engine. The
 development is almost done, and it seems to be working fine, however some
 of my colleagues come with the idea that we're using it wrong. We're
 using
 it as a service from PHP/Symfony.
 
 They think we should use Solritas as a frontend, so site visitors will
 directly use it, so no PHP will be involved, so it will be use much less
 infrastructure. One of them said that even mobile.de using it that way
 (I
 have found no clue about it at all).
 
 Do you think is it a good idea?
 
 Do you know services using Solritas as a frontend on a public site?
 
 My personal opinion is that using Solritas in production is a very bad
 idea
 for us, but have not so much experience with Solr yet, and Solritas
 documentation is far from a detailed, up-to-date one, so don't really
 know
 what is it really usable for.
 
 Thanks,
 Andras
 
 

--
Walter Underwood
wun...@wunderwood.org





Re: Solritas in production

2012-05-06 Thread Radim Kolar

Dne 6.5.2012 14:02, András Bártházi napsal(a):

We're currently evaluating Solr as a Sphinx replacement. The
development is almost done, and it seems to be working fine

why you want to replace sphinx with solr?


Re: Solritas in production

2012-05-06 Thread András Bártházi
Hi,

We're currently evaluating Solr as a Sphinx replacement. The

 development is almost done, and it seems to be working fine

 why you want to replace sphinx with solr?


(is it ontopic here?)

Solr+Lucene has far more features, and seems to be more extendable as well.

- we had problems about speed related to faceting (while I know it can be
implemented), Solr's faceting looks promising
- we had to do some workaround to implement some other features on the
site, the Solr version of these are simpler
- and we miss a good solution for scalability and availability like Solr
Cloud.

Anyway, Sphinx works really well, but Solr seems to be better for us.

Bye,
  Andras


Re: Solritas in production

2012-05-06 Thread Otis Gospodnetic
Hi,

- Original Message -

 Dne 6.5.2012 14:02, András Bártházi napsal(a):
  We're currently evaluating Solr as a Sphinx replacement. The
  development is almost done, and it seems to be working fine
 why you want to replace sphinx with solr?



That's an interesting question.  Until recently, I never saw any Sematext 
clients coming to us wanting to replace Sphinx.  Then one company came to us 
with this request and I jokingly said how we never see people using Sphinx.  
Then suddenly, in the last few months, we've had a number of clients who 
mentioned Sphinx in one context or another.

Is something (not) happening with Sphinx?

Otis

Performance Monitoring for Solr / ElasticSearch / HBase - 
http://sematext.com/spm