Re: Help me understand these newrelic graphs

2014-03-13 Thread Otis Gospodnetic
It really depends, hard to give a definitive instruction without more
pieces of info.
e.g. if your CPUs are all maxed out and you already have a high number of
concurrent queries than sharding may not be of any help at all.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Mar 13, 2014 at 7:42 PM, Software Dev wrote:

> Ahh.. its including the add operation. That makes sense I then. A bit silly
> on NR's part they don't break it down.
>
> Otis, our index is only 8G so I don't consider that big by any means but
> our queries can get a bit complex with a bit of faceting. Do you still
> think it makes sense to shard? How easy would this be to get working?
>
>
> On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic <
> otis.gospodne...@gmail.com> wrote:
>
> > Hi,
> >
> > I think NR has support for breaking by handler, no?  Just checked - no.
> >  Only webapp controller, but that doesn't apply to Solr.
> >
> > SPM should be more helpful when it comes to monitoring Solr - you can
> > filter by host, handler, collection/core, etc. -- you can see the demo -
> > https://apps.sematext.com/demo - though this is plain Solr, not
> SolrCloud.
> >
> > If your index is big or queries are complex, shard it and parallelize
> > search.
> >
> > Otis
> > --
> > Performance Monitoring * Log Analytics * Search Analytics
> > Solr & Elasticsearch Support * http://sematext.com/
> >
> >
> > On Thu, Mar 13, 2014 at 6:17 PM, ralph tice 
> wrote:
> >
> > > I think your response time is including the average response for an add
> > > operation, which generally returns very quickly and due to sheer number
> > are
> > > averaging out the response time of your queries.  New Relic should
> break
> > > out requests based on which handler they're hitting but they don't seem
> > to.
> > >
> > >
> > > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev <
> static.void@gmail.com
> > > >wrote:
> > >
> > > > Here are some screen shots of our Solr Cloud cluster via Newrelic
> > > >
> > > > http://postimg.org/gallery/2hyzyeyc/
> > > >
> > > > We currently have a 5 node cluster and all indexing is done on
> separate
> > > > machines and shipped over. Our machines are running on SSD's with 18G
> > of
> > > > ram (Index size is 8G). We only have 1 shard at the moment with
> > replicas
> > > on
> > > > all 5 machines. I'm guessing thats a bit of a waste?
> > > >
> > > > How come when we do our bulk updating the response time actually
> > > decreases?
> > > > I would think the load would be higher therefor response time should
> be
> > > > higher. Any way I can decrease the response time?
> > > >
> > > > Thanks
> > > >
> > >
> >
>


Re: Zookeeper latencies and pending requests - Solr 4.3

2014-03-13 Thread Shawn Heisey
On 3/13/2014 7:24 PM, Chris W wrote:
> Any help on this is much appreciated. Is it better to use more cores for
> zookeeper (as opposed to 1 core machine)?

I would guess that disk latency is the biggest bottleneck for zookeeper.
 Unless the SolrCloud install is quite large, I don't think that much is
required in the way of CPU resources.

If the ZK instances are not on dedicated hardware, you could ensure
stellar performance by putting its database on one or more dedicated
disks attached to a controller with battery backed (or NVRAM) cache.

Thanks,
Shawn



solr securing index files

2014-03-13 Thread Prasi S
Hi,
Is there any way to secure the solr index directory . I have many users on
a server and i want to restrict file access to only the administrator.

does securing the index directory affect solr accessing the folder


Thanks,
Prasi


Re: Check my thinking on this, wildcard matching in phrases.

2014-03-13 Thread Alexandre Rafalovitch
Different but (conceptually) similar?
http://robotlibrarian.billdueber.com/2012/03/boosting-on-exactish-anchored-phrase-matching-in-solr-sst-4/index.html

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Mar 14, 2014 at 8:38 AM, Erick Erickson  wrote:
> or "why haven't I thought of this before"?
>
> I'm once again being faced with the recurring problem of phrase
> searches with wildcards. It'll lead to index bloat, but that's
> acceptable in this situation, at least until proved not so.
>
> The surround query parser can deal with wildcards and proximith, but
> it doesn't accept anything less than three leading characters, which
> is another problem in this case.
>
> I know the complex phrase query parser is out there, but it's not part
> of the code base.
>
> So I'm thinking of modifying the EdgeNGramFilter, I've coded up a
> prototype that seems to work. Basically, it just appends $ to all the
> grams _except_ the last one. I set maxGramSize to 1000, so we'll
> assume the final gram is the original term.
>
> So, indexing "my dog has fleas" I get
> pos 1 pos 2 pos 3   pos 4
> m$  d$ h$  f$
> my  do$   ha$fl$
>dog   has fle$
> flea$
> fleas
>
>
> Now, when users want to search for "m* fleas" within 5 words, they can
> search for :
> "m$ fleas"~5
> or
> "m$ fle$"~5
> or even
> "m$ do$ fle$"~3
>
>
> and they won't get false matches on something like
> "do ha"
>
> You have to accept some simplifications here, of course. This doesn't
> handle things like "fle*s" and the like.
>
> I'm also not sure this is general-purpose enough to make an option for
> EdgeNGramFilterFactory, the use-case is somewhat restricted. But
> that's a relatively natural fit, a new param like
> 'subGramAppendChar="$" '
>
> Thoughts?


Check my thinking on this, wildcard matching in phrases.

2014-03-13 Thread Erick Erickson
or "why haven't I thought of this before"?

I'm once again being faced with the recurring problem of phrase
searches with wildcards. It'll lead to index bloat, but that's
acceptable in this situation, at least until proved not so.

The surround query parser can deal with wildcards and proximith, but
it doesn't accept anything less than three leading characters, which
is another problem in this case.

I know the complex phrase query parser is out there, but it's not part
of the code base.

So I'm thinking of modifying the EdgeNGramFilter, I've coded up a
prototype that seems to work. Basically, it just appends $ to all the
grams _except_ the last one. I set maxGramSize to 1000, so we'll
assume the final gram is the original term.

So, indexing "my dog has fleas" I get
pos 1 pos 2 pos 3   pos 4
m$  d$ h$  f$
my  do$   ha$fl$
   dog   has fle$
flea$
fleas


Now, when users want to search for "m* fleas" within 5 words, they can
search for :
"m$ fleas"~5
or
"m$ fle$"~5
or even
"m$ do$ fle$"~3


and they won't get false matches on something like
"do ha"

You have to accept some simplifications here, of course. This doesn't
handle things like "fle*s" and the like.

I'm also not sure this is general-purpose enough to make an option for
EdgeNGramFilterFactory, the use-case is somewhat restricted. But
that's a relatively natural fit, a new param like
'subGramAppendChar="$" '

Thoughts?


Re: Please Enable Wiki Editing

2014-03-13 Thread Alexandre Rafalovitch
What about SEO? If somebody gives me Google Analytics access, I would
be happy to dig around that for a while to see if people can actually
find stuff on the Wiki.

Regards,
   Alex.
Personal website: http://www.outerthoughts.com/
LinkedIn: http://www.linkedin.com/in/alexandrerafalovitch
- Time is the quality of nature that keeps events from happening all
at once. Lately, it doesn't seem to be working.  (Anonymous  - via GTD
book)


On Fri, Mar 14, 2014 at 8:18 AM, Erick Erickson  wrote:
> Done, thanks! We can always use more editors who contribute their 
> experiences...
>
> On Thu, Mar 13, 2014 at 8:01 PM, Greg Gilles  wrote:
>> Hi,
>>
>> Please update my account so I can edit the wiki https://wiki.apache.org/solr.
>>
>> GregG  /  greg22...@yahoo.com
>>
>> Specifically, I was installing Solr on Windows using Tomcat following the 
>> instructions on https://wiki.apache.org/solr/SolrInstall, and had some 
>> issues with the instructions and wanted to update them.  I've included the 
>> issues below to remind myself what changes need to be made to the wiki, or 
>> you can forward this email to whomever manages the install page so they can 
>> make the changes.
>>
>> Thanks -- Greg
>>
>> --
>>
>> Modifications from Stack Exchange that fixed several errors:
>>
>> 1. Changes: Remind users that on Windows, they need to copy over the logging 
>> jars to the tomcat lib directory or they will get a "severe error 
>> filterstart"  on tomcat start up under the Windows steps.
>>
>> 2. Users to need "copy up" a level the  collection1 and bin or the will get 
>> an error page when trying to go to the solar/admin page.
>>
>> 3. Users can modify startup.bat (if that's what they are using to start 
>> tomcat)  instead of setenv or web.xml  to include:
>>
>> set JAVA_OPTS=-Dsolr.solr.home=c:\solr_home\solr 
>> -Dsolr.velocity.enabled=false
>>
>>
>>
>> --
>> Stack Exchange link:
>>
>> http://stackoverflow.com/questions/17619809/installing-apache-solr-4-3-1-on-apache-tomcat-6-0
>>
>>
>>
>> Install Solr in Tomcat
>> Pre Requirements
>> 1 - Machine with Windows OS (Windows 7,8,Xp.. ..etc)
>> 2 - Java 6 or Above
>> 3 - Solr 4.0.0 or Above
>> 4 - Apache-tomcat 6 or Above.
>> Steps to get Solr up on Tomcat Server
>> 1.Install Tomcat on your machine and make sure it is ready to start.(Check 
>> using localhost:8080)
>> 2.Install Solr4.0 distribution package apache-solr-4.0.0.zip and unzip it in 
>> your local directory like C:\apache-solr-4.0.0.
>> 3.Make a folder with name solr-home in your local machine like C:\solr_home.
>> 4.Go back to the solr distribution package that you downloaded 
>> C:\apache-solr-4.0.0. Have a peek inside the Examples/solr 
>> ("C:\solr-4.4.0\example\solr") folder. Copy all those files into the 
>> C:\solr_home folder.(server shutting down exception will come)
>> 5.Look into C:\solr-home\solr and you will see two folders with name 
>> collection1 and bin, copy these two folders a step up to C:\solr_home.(if 
>> lib not copy "severe error filterstart" Exception come)
>> 6.Copy lib from C:\apache-solr-4.0.0\example\lib\ext SLF4J and log4j.jar 
>> file to Tomcat Lib folder C:\Program Files\Apache Software Foundation\Tomcat 
>> 6.0\lib 
>> (https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty)
>> 7.Copy apache-solr-4.0.war (rename to solr.war) from "C:\solr-4.4.0\dist" 
>> directory to webapps directory inside Tomcat.(C:\Program Files\Apache 
>> Software Foundation\Tomcat 6.0\webapps)
>> 8.If tomact is already start then solr folder will create go to "C:\Program 
>> Files\Apache Software Foundation\Tomcat 6.0\webapps\solr\WEB-INF\web.xml" 
>> edit web.xml uncomment entry and edit like following(Exception SolrCore 
>> 'collection1' is not available due to init failure)
>>  solr/home 
>> C:\solr_home\solr 
>> java.lang.String
>> 
>> 9.Start Tomcat and check localhost:8080/solr dashBoard will come


Re: Zookeeper latencies and pending requests - Solr 4.3

2014-03-13 Thread Chris W
Any help on this is much appreciated. Is it better to use more cores for
zookeeper (as opposed to 1 core machine)?



On Wed, Mar 12, 2014 at 4:28 PM, Chris W  wrote:

> Hi Furkan
>
> Load on the network is very low when read workload is on the cluster.
> During indexing, a few of my "commits" get hung forever and the solr nodes
> are attempting to get connection from zookeeper. The peer communication
> between zk is very good and i havent seen any issues. The network transfer
> is around 15-20 mBps when i restart a solr node.
>
> *Infrastructure*: 10 node solrcloud cluster with 3 node zk ensemble
> (m1.medium instances with 1 core cpu, 1.5Gb of Heap out of total of 3Gb
> ram). Solr logs are in the same mount as the solr data and tlogs. Zk logs
> are also in the same mount as zk data. I have 80+ collections which can
> grow up to 150-200 easily.
>
> *Regarding ZK Data*
>
> Why does 50MB pose a problem if none of the system parameters are in an
> alarming state? I have around 80+ collections in solr and the every
> collection has the same schema but different solrconfig.xml.  Hence I am
> bundling every schema,config  into a different zk folder and pushing that
> as a separate config. Is there a way in solr/zookeeper to use one for
> common files (like velocity template, schema)  and push just the
> solrconfig.xml into another config directory? In the 50MB I am sure that
> atleast 90% of the data is duplicate across configs
>
> Kindly advise and thanks for your response
>
>
>
>
>
>
>
>
> On Wed, Mar 12, 2014 at 4:08 PM, Furkan KAMACI wrote:
>
>> Hi;
>>
>> FAQ page says that:
>>
>> *Q: I'm seeing lot's of session timeout exceptions - what to do?*
>> *A: Try raising the ZooKeeper session timeout by editing solr.xml - see
>> the
>> zkClientTimeout attribute. The minimum session timeout is 2 times your
>> ZooKeeper defined tickTime. The maximum is 20 times the tickTime. The
>> default tickTime is 2 seconds. You should avoiding raising this for no
>> good
>> reason, but it should be high enough that you don't see a lot of false
>> session timeouts due to load, network lag, or garbage collection pauses.
>> The default timeout is 15 seconds, but some environments might need to go
>> as high as 30-60 seconds*.
>>
>> So when you do that what is the load of your network? Do you get that
>> timeouts while heavy indexing or at an idle time? If not there should be a
>> network problem. Could you chech whether a problem exists "between" your
>> Zookeeper ensembles? On the other hand could you give some more
>> information
>> about your infrastructure and Solr logs? (PS: 50 mb data *may *cause a
>> problem for your architecture)
>>
>> Thanks;
>> Furkan KAMACI
>>
>>
>> 2014-03-13 0:57 GMT+02:00 Chris W :
>>
>> > Hi
>> >
>> >   I have a 3 node zk ensemble . I see a very high latency for zk
>> responses
>> > and also a lot of outstanding requests (in the order of 30-40)
>> >
>> > I also see that the requests are not going to all zookeeper nodes
>> equally.
>> > One node has more requests/connections than the others. I see that
>> CPU/Mem
>> > and disk usage limits are very normal (under 30% cpu, disk reads in the
>> > order of kb, jvm size is 2 Gb but it hasnt even reached 30% usage). The
>> > size of data in zk is around 50MB
>> >
>> > I also see a few zk timeout for solrcloud nodes causing them to be
>> shown as
>> > "dead" in the cloud view. I have increased the connection timeout to
>> around
>> > 3 minutes and still the same issue seems to be happening
>> >
>> > How do i make zk respond faster to requests and where does zk usually
>> spend
>> > time while dealing with incoming requests?
>> >
>> > Any pointers on how to move forward will be great
>> >
>> > --
>> > Best
>> > --
>> > C
>> >
>>
>
>
>
> --
> Best
> --
> C
>



-- 
Best
-- 
C


Re: Please Enable Wiki Editing

2014-03-13 Thread Erick Erickson
Done, thanks! We can always use more editors who contribute their experiences...

On Thu, Mar 13, 2014 at 8:01 PM, Greg Gilles  wrote:
> Hi,
>
> Please update my account so I can edit the wiki https://wiki.apache.org/solr.
>
> GregG  /  greg22...@yahoo.com
>
> Specifically, I was installing Solr on Windows using Tomcat following the 
> instructions on https://wiki.apache.org/solr/SolrInstall, and had some issues 
> with the instructions and wanted to update them.  I've included the issues 
> below to remind myself what changes need to be made to the wiki, or you can 
> forward this email to whomever manages the install page so they can make the 
> changes.
>
> Thanks -- Greg
>
> --
>
> Modifications from Stack Exchange that fixed several errors:
>
> 1. Changes: Remind users that on Windows, they need to copy over the logging 
> jars to the tomcat lib directory or they will get a "severe error 
> filterstart"  on tomcat start up under the Windows steps.
>
> 2. Users to need "copy up" a level the  collection1 and bin or the will get 
> an error page when trying to go to the solar/admin page.
>
> 3. Users can modify startup.bat (if that's what they are using to start 
> tomcat)  instead of setenv or web.xml  to include:
>
> set JAVA_OPTS=-Dsolr.solr.home=c:\solr_home\solr -Dsolr.velocity.enabled=false
>
>
>
> --
> Stack Exchange link:
>
> http://stackoverflow.com/questions/17619809/installing-apache-solr-4-3-1-on-apache-tomcat-6-0
>
>
>
> Install Solr in Tomcat
> Pre Requirements
> 1 - Machine with Windows OS (Windows 7,8,Xp.. ..etc)
> 2 - Java 6 or Above
> 3 - Solr 4.0.0 or Above
> 4 - Apache-tomcat 6 or Above.
> Steps to get Solr up on Tomcat Server
> 1.Install Tomcat on your machine and make sure it is ready to start.(Check 
> using localhost:8080)
> 2.Install Solr4.0 distribution package apache-solr-4.0.0.zip and unzip it in 
> your local directory like C:\apache-solr-4.0.0.
> 3.Make a folder with name solr-home in your local machine like C:\solr_home.
> 4.Go back to the solr distribution package that you downloaded 
> C:\apache-solr-4.0.0. Have a peek inside the Examples/solr 
> ("C:\solr-4.4.0\example\solr") folder. Copy all those files into the 
> C:\solr_home folder.(server shutting down exception will come)
> 5.Look into C:\solr-home\solr and you will see two folders with name 
> collection1 and bin, copy these two folders a step up to C:\solr_home.(if lib 
> not copy "severe error filterstart" Exception come)
> 6.Copy lib from C:\apache-solr-4.0.0\example\lib\ext SLF4J and log4j.jar file 
> to Tomcat Lib folder C:\Program Files\Apache Software Foundation\Tomcat 
> 6.0\lib 
> (https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty)
> 7.Copy apache-solr-4.0.war (rename to solr.war) from "C:\solr-4.4.0\dist" 
> directory to webapps directory inside Tomcat.(C:\Program Files\Apache 
> Software Foundation\Tomcat 6.0\webapps)
> 8.If tomact is already start then solr folder will create go to "C:\Program 
> Files\Apache Software Foundation\Tomcat 6.0\webapps\solr\WEB-INF\web.xml" 
> edit web.xml uncomment entry and edit like following(Exception SolrCore 
> 'collection1' is not available due to init failure)
>  solr/home 
> C:\solr_home\solr 
> java.lang.String
> 
> 9.Start Tomcat and check localhost:8080/solr dashBoard will come


Please Enable Wiki Editing

2014-03-13 Thread Greg Gilles
Hi,

Please update my account so I can edit the wiki https://wiki.apache.org/solr. 

GregG  /  greg22...@yahoo.com

Specifically, I was installing Solr on Windows using Tomcat following the 
instructions on https://wiki.apache.org/solr/SolrInstall, and had some issues 
with the instructions and wanted to update them.  I've included the issues 
below to remind myself what changes need to be made to the wiki, or you can 
forward this email to whomever manages the install page so they can make the 
changes.

Thanks -- Greg

--

Modifications from Stack Exchange that fixed several errors:

1. Changes: Remind users that on Windows, they need to copy over the logging 
jars to the tomcat lib directory or they will get a "severe error filterstart"  
on tomcat start up under the Windows steps.

2. Users to need "copy up" a level the  collection1 and bin or the will get an 
error page when trying to go to the solar/admin page.

3. Users can modify startup.bat (if that's what they are using to start tomcat) 
 instead of setenv or web.xml  to include: 

set JAVA_OPTS=-Dsolr.solr.home=c:\solr_home\solr -Dsolr.velocity.enabled=false



--
Stack Exchange link:

http://stackoverflow.com/questions/17619809/installing-apache-solr-4-3-1-on-apache-tomcat-6-0



Install Solr in Tomcat
Pre Requirements
1 – Machine with Windows OS (Windows 7,8,Xp.. ..etc)
2 – Java 6 or Above
3 - Solr 4.0.0 or Above
4 – Apache-tomcat 6 or Above.
Steps to get Solr up on Tomcat Server
1.Install Tomcat on your machine and make sure it is ready to start.(Check 
using localhost:8080)
2.Install Solr4.0 distribution package apache-solr-4.0.0.zip and unzip it in 
your local directory like C:\apache-solr-4.0.0.
3.Make a folder with name solr-home in your local machine like C:\solr_home.
4.Go back to the solr distribution package that you downloaded 
C:\apache-solr-4.0.0. Have a peek inside the Examples/solr 
("C:\solr-4.4.0\example\solr") folder. Copy all those files into the 
C:\solr_home folder.(server shutting down exception will come)
5.Look into C:\solr-home\solr and you will see two folders with name 
collection1 and bin, copy these two folders a step up to C:\solr_home.(if lib 
not copy "severe error filterstart" Exception come)
6.Copy lib from C:\apache-solr-4.0.0\example\lib\ext SLF4J and log4j.jar file 
to Tomcat Lib folder C:\Program Files\Apache Software Foundation\Tomcat 6.0\lib 
(https://wiki.apache.org/solr/SolrLogging#Using_the_example_logging_setup_in_containers_other_than_Jetty)
7.Copy apache-solr-4.0.war (rename to solr.war) from "C:\solr-4.4.0\dist" 
directory to webapps directory inside Tomcat.(C:\Program Files\Apache Software 
Foundation\Tomcat 6.0\webapps)
8.If tomact is already start then solr folder will create go to "C:\Program 
Files\Apache Software Foundation\Tomcat 6.0\webapps\solr\WEB-INF\web.xml" edit 
web.xml uncomment entry and edit like following(Exception SolrCore 
'collection1' is not available due to init failure)
 solr/home 
C:\solr_home\solr 
java.lang.String
 
9.Start Tomcat and check localhost:8080/solr dashBoard will come


Re: Solr supports log-based recovery?

2014-03-13 Thread Otis Gospodnetic
Skimmed this, but yes, docs are durable thanks to transaction log that can
replay on start.

Otis
Solr & ElasticSearch Support
http://sematext.com/
On Mar 13, 2014 8:25 PM, "shushuai zhu"  wrote:

> Hi,
>
> I noticed the following post indicating that Solr could recover
> not-committed data from operational log:
>
>
> http://www.opensourceconnections.com/2013/04/25/understanding-solr-soft-commits-and-data-durability/
>
> which contradicts with Solr's web site:
>
> https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching
>
> that seems to indicate that data soft-committed before the last
> hard-commit is lost.
>
> I reproduced what the author did in the first post (the two lessons he
> listed) with Solr 4.7, and specifically compared below two experiments:
>
> I posted some records to Solr without commit
> I could not view the records on browser after that since I set soft-commit
> in 5 seconds
> After 5 seconds, I can view the records on browser
> Hard commit still does not happen since I set it in 60 seconds
> Kill the Solr with a kill -9 
> Keep the log file
> Re-start the Solr
> I could see the records via browser
>
> I think the hard-commit does not happen in the above experiment, since in
> a different experiment, I got:
>
> I posted some records to Solr without commit
> I could not view the records on browser after that since I set soft-commit
> in 5 seconds
> After 5 seconds, I can view the records on browser
> Hard commit still does not happen since I set it in 60 seconds
> Kill the Solr with a kill -9 
> Remove the log file
> Re-start the Solr
> I could NOT see the records via browser
>
> This means Solr supports some database-like recovery (based on log). So,
> as long as the log exists, after a crash, Solr can still recover from the
> log.
>
> Any comments or idea?
>
> Thanks.
>
> Shushuai
>


Solr supports log-based recovery?

2014-03-13 Thread shushuai zhu
Hi,

I noticed the following post indicating that Solr could recover not-committed 
data from operational log:

http://www.opensourceconnections.com/2013/04/25/understanding-solr-soft-commits-and-data-durability/

which contradicts with Solr's web site:

https://cwiki.apache.org/confluence/display/solr/Near+Real+Time+Searching

that seems to indicate that data soft-committed before the last hard-commit is 
lost.
 
I reproduced what the author did in the first post (the two lessons he listed) 
with Solr 4.7, and specifically compared below two experiments:

I posted some records to Solr without commit
I could not view the records on browser after that since I set soft-commit in 5 
seconds
After 5 seconds, I can view the records on browser
Hard commit still does not happen since I set it in 60 seconds
Kill the Solr with a kill -9 
Keep the log file
Re-start the Solr
I could see the records via browser

I think the hard-commit does not happen in the above experiment, since in a 
different experiment, I got:

I posted some records to Solr without commit
I could not view the records on browser after that since I set soft-commit in 5 
seconds
After 5 seconds, I can view the records on browser
Hard commit still does not happen since I set it in 60 seconds
Kill the Solr with a kill -9 
Remove the log file
Re-start the Solr
I could NOT see the records via browser

This means Solr supports some database-like recovery (based on log). So, as 
long as the log exists, after a crash, Solr can still recover from the log.

Any comments or idea?

Thanks.

Shushuai


Re: Help me understand these newrelic graphs

2014-03-13 Thread Software Dev
Ahh.. its including the add operation. That makes sense I then. A bit silly
on NR's part they don't break it down.

Otis, our index is only 8G so I don't consider that big by any means but
our queries can get a bit complex with a bit of faceting. Do you still
think it makes sense to shard? How easy would this be to get working?


On Thu, Mar 13, 2014 at 4:02 PM, Otis Gospodnetic <
otis.gospodne...@gmail.com> wrote:

> Hi,
>
> I think NR has support for breaking by handler, no?  Just checked - no.
>  Only webapp controller, but that doesn't apply to Solr.
>
> SPM should be more helpful when it comes to monitoring Solr - you can
> filter by host, handler, collection/core, etc. -- you can see the demo -
> https://apps.sematext.com/demo - though this is plain Solr, not SolrCloud.
>
> If your index is big or queries are complex, shard it and parallelize
> search.
>
> Otis
> --
> Performance Monitoring * Log Analytics * Search Analytics
> Solr & Elasticsearch Support * http://sematext.com/
>
>
> On Thu, Mar 13, 2014 at 6:17 PM, ralph tice  wrote:
>
> > I think your response time is including the average response for an add
> > operation, which generally returns very quickly and due to sheer number
> are
> > averaging out the response time of your queries.  New Relic should break
> > out requests based on which handler they're hitting but they don't seem
> to.
> >
> >
> > On Thu, Mar 13, 2014 at 2:18 PM, Software Dev  > >wrote:
> >
> > > Here are some screen shots of our Solr Cloud cluster via Newrelic
> > >
> > > http://postimg.org/gallery/2hyzyeyc/
> > >
> > > We currently have a 5 node cluster and all indexing is done on separate
> > > machines and shipped over. Our machines are running on SSD's with 18G
> of
> > > ram (Index size is 8G). We only have 1 shard at the moment with
> replicas
> > on
> > > all 5 machines. I'm guessing thats a bit of a waste?
> > >
> > > How come when we do our bulk updating the response time actually
> > decreases?
> > > I would think the load would be higher therefor response time should be
> > > higher. Any way I can decrease the response time?
> > >
> > > Thanks
> > >
> >
>


Re: Help me understand these newrelic graphs

2014-03-13 Thread Otis Gospodnetic
Hi,

I think NR has support for breaking by handler, no?  Just checked - no.
 Only webapp controller, but that doesn't apply to Solr.

SPM should be more helpful when it comes to monitoring Solr - you can
filter by host, handler, collection/core, etc. -- you can see the demo -
https://apps.sematext.com/demo - though this is plain Solr, not SolrCloud.

If your index is big or queries are complex, shard it and parallelize
search.

Otis
--
Performance Monitoring * Log Analytics * Search Analytics
Solr & Elasticsearch Support * http://sematext.com/


On Thu, Mar 13, 2014 at 6:17 PM, ralph tice  wrote:

> I think your response time is including the average response for an add
> operation, which generally returns very quickly and due to sheer number are
> averaging out the response time of your queries.  New Relic should break
> out requests based on which handler they're hitting but they don't seem to.
>
>
> On Thu, Mar 13, 2014 at 2:18 PM, Software Dev  >wrote:
>
> > Here are some screen shots of our Solr Cloud cluster via Newrelic
> >
> > http://postimg.org/gallery/2hyzyeyc/
> >
> > We currently have a 5 node cluster and all indexing is done on separate
> > machines and shipped over. Our machines are running on SSD's with 18G of
> > ram (Index size is 8G). We only have 1 shard at the moment with replicas
> on
> > all 5 machines. I'm guessing thats a bit of a waste?
> >
> > How come when we do our bulk updating the response time actually
> decreases?
> > I would think the load would be higher therefor response time should be
> > higher. Any way I can decrease the response time?
> >
> > Thanks
> >
>


Re: Help me understand these newrelic graphs

2014-03-13 Thread Ahmet Arslan
Hi,

Ralphs comment makes sense. We can confirm his explanation. What happens when 
you select only QueryComponent and FacetComponent in first graph (requests 
response time)? 



On Friday, March 14, 2014 12:18 AM, ralph tice  wrote:
I think your response time is including the average response for an add
operation, which generally returns very quickly and due to sheer number are
averaging out the response time of your queries.  New Relic should break
out requests based on which handler they're hitting but they don't seem to.



On Thu, Mar 13, 2014 at 2:18 PM, Software Dev wrote:

> Here are some screen shots of our Solr Cloud cluster via Newrelic
>
> http://postimg.org/gallery/2hyzyeyc/
>
> We currently have a 5 node cluster and all indexing is done on separate
> machines and shipped over. Our machines are running on SSD's with 18G of
> ram (Index size is 8G). We only have 1 shard at the moment with replicas on
> all 5 machines. I'm guessing thats a bit of a waste?
>
> How come when we do our bulk updating the response time actually decreases?
> I would think the load would be higher therefor response time should be
> higher. Any way I can decrease the response time?
>
> Thanks
>



Re: Help me understand these newrelic graphs

2014-03-13 Thread ralph tice
I think your response time is including the average response for an add
operation, which generally returns very quickly and due to sheer number are
averaging out the response time of your queries.  New Relic should break
out requests based on which handler they're hitting but they don't seem to.


On Thu, Mar 13, 2014 at 2:18 PM, Software Dev wrote:

> Here are some screen shots of our Solr Cloud cluster via Newrelic
>
> http://postimg.org/gallery/2hyzyeyc/
>
> We currently have a 5 node cluster and all indexing is done on separate
> machines and shipped over. Our machines are running on SSD's with 18G of
> ram (Index size is 8G). We only have 1 shard at the moment with replicas on
> all 5 machines. I'm guessing thats a bit of a waste?
>
> How come when we do our bulk updating the response time actually decreases?
> I would think the load would be higher therefor response time should be
> higher. Any way I can decrease the response time?
>
> Thanks
>


Re: single node causing cluster-wide outage

2014-03-13 Thread Avishai Ish-Shalom
a little more information: it seems the issue is happening after we get
OutOfMemory error on facet query.


On Wed, Mar 12, 2014 at 11:06 PM, Avishai Ish-Shalom
wrote:

> Hi all!
>
> After upgrading to Solr 4.6.1 we encountered a situation where a cluster
> outage was traced to a single node misbehaving, after restarting the node
> the cluster immediately returned to normal operation.
> The bad node had ~420 threads locked on FastLRUCache and most
> httpshardexecutor threads were waiting on apache commons http futures.
>
> Has anyone encountered such a situation? what can we do to prevent
> misbehaving nodes from bringing down the entire cluster?
>
> Cheers,
> Avishai
>


Re: Solr Cloud error with shard update

2014-03-13 Thread Shawn Heisey

On 3/13/2014 12:54 PM, cpk wrote:

We're seeing the same behavior with Solr 4.6.0 and 4.7.  DataInputHandler
loads documents, but the updates to the replica fail because of the limited
support for the BigDecimal type in SolrCloud.

We've successfully worked around the issue by setting convertType=true in
the DIH config.  This tells DIH to convert the BigDecimal to the supported
Solr type (float, double, etc) defined in your schema.xml for the field
before submitting to Solr.

In my opinion, this is more of a issue with DIH, than SolrCloud.  DIH
shouldn't try to submit the BigDecimal type, if its not well supported by
Solr.  SolrCloud should try to support BigDecimal, but that suggestion has
been pending for a while.


The real problem here is not DIH, but the JDBC driver.  The convertType 
parameter that you have set is for your JDBC driver. For most people, 
these details don't matter, because they are writing Java code 
themselves and can adjust according to the peculiarities of a specific 
JDBC driver.  DIH is a *generic* solution that can only deal with 
standard types.


BigDecimal is not a standard java data type. Although it is included in 
the standard JVM, it is part of the *math* package, it is not built into 
Java.


http://docs.oracle.com/javase/7/docs/api/java/math/BigDecimal.html

You might wonder why DIH doesn't convert the data.  The answer is that 
without the programmer explicitly providing code to detect each 
nonstandard type, it won't know *HOW* to convert it. Solr and Lucene 
can't be expected to support every data type, especially if the data 
type is not even available until you import a class.


Thanks,
Shawn



Help me understand these newrelic graphs

2014-03-13 Thread Software Dev
Here are some screen shots of our Solr Cloud cluster via Newrelic

http://postimg.org/gallery/2hyzyeyc/

We currently have a 5 node cluster and all indexing is done on separate
machines and shipped over. Our machines are running on SSD's with 18G of
ram (Index size is 8G). We only have 1 shard at the moment with replicas on
all 5 machines. I'm guessing thats a bit of a waste?

How come when we do our bulk updating the response time actually decreases?
I would think the load would be higher therefor response time should be
higher. Any way I can decrease the response time?

Thanks


Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
Hey Shawn,

> The config with the old policy used to be the literal name
> "mergeFactor".  With TieredMergePolicy, there are now three settings
> that must be changed in order to actually be the same as what
> mergeFactor used to do.The followingconfig snippet is the equivalent
> config to a mergeFactor of 10, so these are the default settings.  If
> you don't change all three (especially segmentsPerTier), then you are
> not actually changing the "mergeFactor".
> 
>
>  10
>  10
>  30
>

I tried specifying all these configurations, but it still doesn't work as
expected. I even tried specifying a maxMergeSegmentMB to 20GB instead of the
default 5GB. This is the config I tried:


  2
  2
  100
  2199023220



> With newer Solr versions, there is not as much speedup to be gained from
> fewer segments as before.  There *is* a noticeable change, but it is no
> longer the night/day difference it used to be.

We did a performance test on a normal and optimized index and saw a
considerable improvement (almost double) in response time. That's the reason
why we want to reduce our number of segments as we have a large index with
very small amount of updates.

> Assuming that there are no system resource limitations(especially RAM),
> a distributed index is slower than a single index of the same total
> size.  Where distributed indexes have an edge is in very large indexes
> or indexes with a moderately high query rate -- by applying more total
> RAM and/or CPU resources to the problem.  If your index already fits
> entirely into the OS disk cache, or you are sending a a handful of test
> queries, you won't notice any performance benefit from going distributed.

We have a large index which won't fit in memory and need high query rates.

> For SUPER high query rates, you need more replicas.  More shards might
> actually make performance go down in this situation.

This is something we identified while testing. We had to optimize the number
of shards to be lesser but a reasonable number that will allow us grow the
size of data in future.

-Varun


> I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
> machine with a zookeeper quorum running on 3 other machines. The index
> size
> on each shard is about 15GB. I noticed that the number of segments in
> second shard was 42 and in the remaining shards was between 25-30.
>
> I am basically trying to get the number of segments down to a reasonable
> size like 4 or 5 in order to improve the search time. We do have some
> documents indexed everyday, so we don't want to do an optimize every day.
>
> The merge factor with the TierMergePolicy is only the number of segments
> per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
> shard, I tried clearing the index, reducing the mergeFactor and
> re-indexing
> the same data in the same manner, multiple times, but I don't see a
> pattern
> of reduction in number of segments.
>
> No mergeFactor set  => 42 segments
> mergeFactor=5  =>   22 segments
> mergeFactor=2  =>   22 segments
>
> Below is the simple configuration, as specified in the documentation, I am
> using for merging:
>
> 
>
>2
>
>2
>
> 
>
> 
>
> What is the best way in which I can use merging to restrict the number of
> segments being formed? 



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Segments-and-Merging-Issues-tp4123316p4123489.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr Cloud error with shard update

2014-03-13 Thread cpk
In case anyone else runs across this issue, I think we've found a
work-around.

We're seeing the same behavior with Solr 4.6.0 and 4.7.  DataInputHandler
loads documents, but the updates to the replica fail because of the limited
support for the BigDecimal type in SolrCloud.

We've successfully worked around the issue by setting convertType=true in
the DIH config.  This tells DIH to convert the BigDecimal to the supported
Solr type (float, double, etc) defined in your schema.xml for the field
before submitting to Solr.

In my opinion, this is more of a issue with DIH, than SolrCloud.  DIH
shouldn't try to submit the BigDecimal type, if its not well supported by
Solr.  SolrCloud should try to support BigDecimal, but that suggestion has
been pending for a while.

Hope this helps.   

Chris



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-error-with-shard-update-tp4106260p4123486.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Delta import throws java heap space exception

2014-03-13 Thread Richard Marquina Lopez
Hi Furkan,

sure, this is my data-config.xml:

  
  


  
  ...

  

...
  
  ...

  


Currently I have 2.1 Million of activities.

Thanks a lot,
Richard


2014-03-12 19:16 GMT-04:00 Furkan KAMACI :

> Hi;
>
> Could you send your data-config.xml?
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-03-13 1:01 GMT+02:00 Richard Marquina Lopez <
> richard.marqu...@gmail.com
> >:
>
> > Hi Ahmet,
> >
> > Thank you for your response, currently I have the next configuration for
> > JVM:
> > -XX:+PrintGCDetails-XX:-UseParallelGC-XX:SurvivorRatio=8-XX:NewRatio=2
> > -XX:+HeapDumpOnOutOfMemoryError-XX:PermSize=128m-XX:MaxPermSize=256m
> > -Xms1024m-Xmx2048m
> > I have 3.67 GB of physical RAM and 2GB is asigned to JVM (-Xmx2048m)
> >
> >
> > 2014-03-12 17:32 GMT-04:00 Ahmet Arslan :
> >
> > > Hi Richard,
> > >
> > > How much ram do you assign to java heap? Try increasing it to 1 gb for
> > > example.
> > > Please see : https://wiki.apache.org/solr/ShawnHeisey
> > >
> > > Ahmet
> > >
> > >
> > >
> > > On Wednesday, March 12, 2014 10:53 PM, Richard Marquina Lopez <
> > > richard.marqu...@gmail.com> wrote:
> > >
> > > Hi,
> > >
> > > I have some problems when execute the delta import with 2 million of
> rows
> > > from mysql database:
> > >
> > > java.lang.OutOfMemoryError: Java heap space
> > > at java.nio.HeapCharBuffer.(HeapCharBuffer.java:57)
> > > at java.nio.CharBuffer.allocate(CharBuffer.java:331)
> > > at
> > java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:777)
> > > at java.nio.charset.Charset.decode(Charset.java:810)
> > > at com.mysql.jdbc.StringUtils.toString(StringUtils.java:2010)
> > > at com.mysql.jdbc.ResultSetRow.getString(ResultSetRow.java:820)
> > > at com.mysql.jdbc.BufferRow.getString(BufferRow.java:541)
> > > at com.mysql.jdbc.ResultSetImpl.getStringInternal(
> > > ResultSetImpl.java:5812)
> > > at
> > com.mysql.jdbc.ResultSetImpl.getString(ResultSetImpl.java:5689)
> > > at
> > com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:4986)
> > > at
> > com.mysql.jdbc.ResultSetImpl.getObject(ResultSetImpl.java:5175)
> > > at org.apache.solr.handler.dataimport.JdbcDataSource$
> > > ResultSetIterator.getARow(JdbcDataSource.java:315)
> > > at org.apache.solr.handler.dataimport.JdbcDataSource$
> > > ResultSetIterator.access$700(JdbcDataSource.java:254)
> > > at org.apache.solr.handler.dataimport.JdbcDataSource$
> > > ResultSetIterator$1.next(JdbcDataSource.java:294)
> > > at org.apache.solr.handler.dataimport.JdbcDataSource$
> > > ResultSetIterator$1.next(JdbcDataSource.java:286)
> > > at
> > org.apache.solr.handler.dataimport.EntityProcessorBase.getNext(
> > > EntityProcessorBase.java:117)
> > > at org.apache.solr.handler.dataimport.SqlEntityProcessor.
> > > nextModifiedRowKey(SqlEntityProcessor.java:86)
> > > at org.apache.solr.handler.dataimport.EntityProcessorWrapper.
> > > nextModifiedRowKey(EntityProcessorWrapper.java:267)
> > > at org.apache.solr.handler.dataimport.DocBuilder.
> > > collectDelta(DocBuilder.java:781)
> > > at org.apache.solr.handler.dataimport.DocBuilder.doDelta(
> > > DocBuilder.java:338)
> > > at org.apache.solr.handler.dataimport.DocBuilder.execute(
> > > DocBuilder.java:223)
> > > at org.apache.solr.handler.dataimport.DataImporter.
> > > doDeltaImport(DataImporter.java:440)
> > > at org.apache.solr.handler.dataimport.DataImporter.
> > > runCmd(DataImporter.java:478)
> > > at org.apache.solr.handler.dataimport.DataImporter$1.run(
> > > DataImporter.java:457)
> > > 
> > >
> >
> --
> > >
> > > java.sql.SQLException: Streaming result set
> > > com.mysql.jdbc.RowDataDynamic@47a034e7
> > > is still active.
> > > No statements may be issued when any streaming result sets are open and
> > in
> > > use on a given connection.
> > > Ensure that you have called .close() on any active streaming result
> sets
> > > before attempting more queries.
> > > at
> com.mysql.jdbc.SQLError.createSQLException(SQLError.java:927)
> > > at
> com.mysql.jdbc.SQLError.createSQLException(SQLError.java:924)
> > > at com.mysql.jdbc.MysqlIO.checkForOutstandingStreamingDa
> > > ta(MysqlIO.java:3361)
> > > at com.mysql.jdbc.MysqlIO.sendCommand(MysqlIO.java:2524)
> > > at com.mysql.jdbc.MysqlIO.sqlQueryDirect(MysqlIO.java:2778)
> > > at
> > com.mysql.jdbc.ConnectionImpl.execSQL(ConnectionImpl.java:2828)
> > > at com.mysql.jdbc.ConnectionImpl.rollbackNoChecks(
> > > ConnectionImpl.java:5204)
> > > at
> > com.mysql.jdbc.ConnectionImpl.rollback(ConnectionImpl.java:5087)
> > > at
> > > com.mysql.jdbc.ConnectionImpl.realClose(ConnectionImpl.java:4690)
> > > at
> com.mysql.jdbc.C

Re: Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Furkan KAMACI
Hi;

When I check my documents I see an example: "meta_keywords". It should
work. You may have a problem with Nutch side. Here is a link for it:
http://wiki.apache.org/nutch/IndexMetatags On the other hand dynamic fields
at Solr is explained here:
https://cwiki.apache.org/confluence/display/solr/Dynamic+Fields

Thanks;
Furkan KAMACI


2014-03-13 20:02 GMT+02:00 Shanaka Jayasundera :

> Hi Furkan,
>
> Thanka, I ve checked only with dynamic field as well, have you done any
> other configuration changes to get it working?
>
> Can you give me some of examples for your meta tags ex metatag.keywords ?
>
> Tx,Shanaka
>
> On Thursday, 13 March 2014, Furkan KAMACI  wrote:
>
> > Hi;
> >
> > I use Nutch and Solr to index meta tags. When you declare that:
> >
> > 
> >
> > It should work. However I have a question. You have that field for copy:
> >
> > metatag.keywords
> >
> > but your dynamic field is
> >
> > meta*_**
> >
> > I mean it should have underscore after meta. It may be wrong for you?
> >
> > Thanks;
> > Furkan KAMACI
> >
> >
> > 2014-03-13 16:09 GMT+02:00 Shanaka Jayasundera  
> > >:
> >
> > > Hello Team,
> > >
> > > I am trying to index meta data of html pages, my setup is Nutch 2.2.1
> and
> > > Solr 4.7.0
> > >
> > > I can confirm Nutch is parsing meta tags and feed data to index on
> Solr.
> > > But I am unable to see meta tags when I query data.
> > >
> > > schema.xml configuration I've done,
> > >
> > > To accept indexing meta tags I've define dynamic filed on solr,
> > schema.xml
> > > as follows,
> > >  indexed="true"/>
> > >
> > >
> > > Probably, mata tags are getting indexing on Solr but not available for
> > > querying, Is there any way to check/debug  meta tags are actually
> indexed
> > > on Solr or not ?
> > > Please note that only search field is default text and I've tried with
> > copy
> > > field as bellow but no luck
> > >
> > >  also
> > >  since
> metatag.keywords
> > > is one of the meta tag extracted from nutch.
> > >
> > > Appreciate community help on this. Thanks
> > >
> > > Shanaka
> > >
> >
>


Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
Hi Remi,

I read your post and like you, I have also identified that running solr
4.6.0 in cloud mode results in higher response time which has something to
do with merging of documents from the various shards.

Looking at the source code, we couldn't understand why it would take so much
time for merging the documents. If you do find any solution, please share
with me.

Thanks,
Varun



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-Cloud-Segments-and-Merging-Issues-tp4123316p4123472.html
Sent from the Solr - User mailing list archive at Nabble.com.


Re: Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Shanaka Jayasundera
Hi Furkan,

Thanka, I ve checked only with dynamic field as well, have you done any
other configuration changes to get it working?

Can you give me some of examples for your meta tags ex metatag.keywords ?

Tx,Shanaka

On Thursday, 13 March 2014, Furkan KAMACI  wrote:

> Hi;
>
> I use Nutch and Solr to index meta tags. When you declare that:
>
> 
>
> It should work. However I have a question. You have that field for copy:
>
> metatag.keywords
>
> but your dynamic field is
>
> meta*_**
>
> I mean it should have underscore after meta. It may be wrong for you?
>
> Thanks;
> Furkan KAMACI
>
>
> 2014-03-13 16:09 GMT+02:00 Shanaka Jayasundera 
> 
> >:
>
> > Hello Team,
> >
> > I am trying to index meta data of html pages, my setup is Nutch 2.2.1 and
> > Solr 4.7.0
> >
> > I can confirm Nutch is parsing meta tags and feed data to index on Solr.
> > But I am unable to see meta tags when I query data.
> >
> > schema.xml configuration I've done,
> >
> > To accept indexing meta tags I've define dynamic filed on solr,
> schema.xml
> > as follows,
> > 
> >
> >
> > Probably, mata tags are getting indexing on Solr but not available for
> > querying, Is there any way to check/debug  meta tags are actually indexed
> > on Solr or not ?
> > Please note that only search field is default text and I've tried with
> copy
> > field as bellow but no luck
> >
> >  also
> >  since metatag.keywords
> > is one of the meta tag extracted from nutch.
> >
> > Appreciate community help on this. Thanks
> >
> > Shanaka
> >
>


Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread Shawn Heisey

On 3/13/2014 1:44 AM, Varun Rajput wrote:

I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
machine with a zookeeper quorum running on 3 other machines. The index size
on each shard is about 15GB. I noticed that the number of segments in
second shard was 42 and in the remaining shards was between 25-30.

I am basically trying to get the number of segments down to a reasonable
size like 4 or 5 in order to improve the search time. We do have some
documents indexed everyday, so we don't want to do an optimize every day.

The merge factor with the TierMergePolicy is only the number of segments
per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
shard, I tried clearing the index, reducing the mergeFactor and re-indexing
the same data in the same manner, multiple times, but I don't see a pattern
of reduction in number of segments.

No mergeFactor set  => 42 segments
mergeFactor=5  =>   22 segments
mergeFactor=2  =>   22 segments

Below is the simple configuration, as specified in the documentation, I am
using for merging:



   2

   2





What is the best way in which I can use merging to restrict the number of
segments being formed?


The config with the old policy used to be the literal name 
"mergeFactor".  With TieredMergePolicy, there are now three settings 
that must be changed in order to actually be the same as what 
mergeFactor used to do.The followingconfig snippet is the equivalent 
config to a mergeFactor of 10, so these are the default settings.  If 
you don't change all three (especially segmentsPerTier), then you are 
not actually changing the "mergeFactor".


  
10
10
30
  

With newer Solr versions, there is not as much speedup to be gained from 
fewer segments as before.  There *is* a noticeable change, but it is no 
longer the night/day difference it used to be.



Also, we are moving from Solr 1.4 (Master-Slave) to Solr 4.6.0 Cloud and
see a great increase in response time from about 18ms to 150ms. Is this a
known issue? Is there no way to reduce the response time? In the MBeans,
the individual cores show the /select handler attributes having search
times around 8ms. What is it that causes the overall response time to
increase so much?


Assuming that there are no system resource limitations(especially RAM), 
a distributed index is slower than a single index of the same total 
size.  Where distributed indexes have an edge is in very large indexes 
or indexes with a moderately high query rate -- by applying more total 
RAM and/or CPU resources to the problem.  If your index already fits 
entirely into the OS disk cache, or you are sending a a handful of test 
queries, you won't notice any performance benefit from going distributed.


For SUPER high query rates, you need more replicas.  More shards might 
actually make performance go down in this situation.


You can run a single shard with SolrCloud -- there's nothing saying the 
index HAS to be distributed.


Thanks,
Shawn



Re: solr result in miliseconds

2014-03-13 Thread Ahmet Arslan


Hi,

Ups, I miswrote, it is omitHeader not omitHeaders
Please see : http://wiki.apache.org/solr/CommonQueryParameters#omitHeader


Ahmet

On Thursday, March 13, 2014 6:37 PM, Ahmet Arslan  wrote:
Hi Kishan,

Solr response already includes that info in QTime section. Aren't you seeing 
it? If you don't see it try setting &omitHeaders=false







On Thursday, March 13, 2014 6:12 PM, Kishan Parmar  wrote:
Hello,
how to get milliseconds result function in solr gives result in milliseconds
like
-->7 result found in 0.00456 milliseconds.

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



Re: solr result in miliseconds

2014-03-13 Thread Ahmet Arslan
Hi Kishan,

Solr response already includes that info in QTime section. Aren't you seeing 
it? If you don't see it try setting &omitHeaders=false






On Thursday, March 13, 2014 6:12 PM, Kishan Parmar  wrote:
Hello,
how to get milliseconds result function in solr gives result in milliseconds
like
-->7 result found in 0.00456 milliseconds.

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!



solr result in miliseconds

2014-03-13 Thread Kishan Parmar
Hello,
how to get milliseconds result function in solr gives result in milliseconds
like
-->7 result found in 0.00456 milliseconds.

Regards,

Kishan Parmar
Software Developer
+91 95 100 77394
Jay Shree Krishnaa !!


Re: Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Furkan KAMACI
Hi;

I use Nutch and Solr to index meta tags. When you declare that:



It should work. However I have a question. You have that field for copy:

metatag.keywords

but your dynamic field is

meta*_**

I mean it should have underscore after meta. It may be wrong for you?

Thanks;
Furkan KAMACI


2014-03-13 16:09 GMT+02:00 Shanaka Jayasundera :

> Hello Team,
>
> I am trying to index meta data of html pages, my setup is Nutch 2.2.1 and
> Solr 4.7.0
>
> I can confirm Nutch is parsing meta tags and feed data to index on Solr.
> But I am unable to see meta tags when I query data.
>
> schema.xml configuration I've done,
>
> To accept indexing meta tags I've define dynamic filed on solr, schema.xml
> as follows,
> 
>
>
> Probably, mata tags are getting indexing on Solr but not available for
> querying, Is there any way to check/debug  meta tags are actually indexed
> on Solr or not ?
> Please note that only search field is default text and I've tried with copy
> field as bellow but no luck
>
>  also
>  since metatag.keywords
> is one of the meta tag extracted from nutch.
>
> Appreciate community help on this. Thanks
>
> Shanaka
>


Re: Partial Counts in SOLR

2014-03-13 Thread Salman Akram
1- SOLR 4.6
2- We do but right now I am talking about plain keyword queries just sorted
by date. Once this is better will start looking into caches which we
already changed a little.
3- As I said the contents are not stored in this index. Some other metadata
fields are but with normal queries its super fast so I guess even if I
change there it will be a minor difference. We have SSD and quite fast too.
4- That's something we need to do but even in low workload those queries
take a lot of time
5- Every 10 mins and currently no auto warming as user queries are rarely
same and also once its fully warmed those queries are still slow.
6- Nops.

On Thu, Mar 13, 2014 at 5:38 PM, Dmitry Kan  wrote:

> 1. What is your solr version? In 4.x family the proximity searches have
> been optimized among other query types.
> 2. Do you use the filter queries? What is the situation with the cache
> utilization ratios? Optimize (= i.e. bump up the respective cache sizes) if
> you have low hitratios and many evictions.
> 3. Can you avoid storing some fields and only index them? When the field is
> stored and it is retrieved in the result, there are couple of disk seeks
> per field=> search slows down. Consider SSD disks.
> 4. Do you monitor your system in terms of RAM / cache stats / GC? Do you
> observe STW GC pauses?
> 5. How often do you commit & do you have the autowarming / external warming
> configured?
> 6. If you use faceting, consider storing DocValues for facet fields.
>
> some solr wiki docs:
>
> https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29
>
>
>
>
>
> On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram <
> salman.ak...@northbaysolutions.net> wrote:
>
> > Well some of the searches take minutes.
> >
> > Below are some stats about this particular index that I am talking about:
> >
> > Index size = 400GB (Using CommonGrams so without that the index is around
> > 180GB)
> > Position File = 280GB
> > Total Docs = 170 million (just indexed for searching - for highlighting
> > contents are stored in another index)
> > Avg Doc Size = Few hundred KBs
> > RAM = 384GB (it has other indexes too but still OS cache can have 60-80%
> of
> > the total index cached)
> >
> > Phrase queries run pretty fast with CG but complex versions of wildcard
> and
> > proximity queries can be really slow. I know using CG will make them slow
> > but they just take too long. By default sorting is on date but users have
> > few other parameters too on which they can sort.
> >
> > I wanted to avoid creating multiple indexes (maybe based on years) but
> > seems that to search on partial data that's the only feasible way.
> >
> >
> >
> >
> > On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan 
> wrote:
> >
> > > As Hoss pointed out above, different projects have different
> > requirements.
> > > Some want to sort by date of ingestion reverse, which means that having
> > > posting lists organized in a reverse order with the early termination
> is
> > > the way to go (no such feature in Solr directly). Some other projects
> > want
> > > to collect all docs matching a query, and then sort by rank, but you
> > cannot
> > > guarantee, that the most recently inserted document is the most
> relevant
> > in
> > > terms of your ranking.
> > >
> > >
> > > Do your current searches take too long?
> > >
> > >
> > > On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram <
> > > salman.ak...@northbaysolutions.net> wrote:
> > >
> > > > Its a long video and I will definitely go through it but it seems
> this
> > is
> > > > not possible with SOLR as it is?
> > > >
> > > > I just thought it would be quite a common issue; I mean generally for
> > > > search engines its more important to show the first page results,
> > rather
> > > > than using timeAllowed which might not even return a single result.
> > > >
> > > > Thanks!
> > > >
> > > >
> > > > --
> > > > Regards,
> > > >
> > > > Salman Akram
> > > >
> > >
> > >
> > >
> > > --
> > > Dmitry
> > > Blog: http://dmitrykan.blogspot.com
> > > Twitter: http://twitter.com/dmitrykan
> > >
> >
> >
> >
> > --
> > Regards,
> >
> > Salman Akram
> >
>
>
>
> --
> Dmitry
> Blog: http://dmitrykan.blogspot.com
> Twitter: http://twitter.com/dmitrykan
>



-- 
Regards,

Salman Akram


Solr 4 Dynamic filed : Indexing and Searching

2014-03-13 Thread Shanaka Jayasundera
Hello Team,

I am trying to index meta data of html pages, my setup is Nutch 2.2.1 and
Solr 4.7.0

I can confirm Nutch is parsing meta tags and feed data to index on Solr.
But I am unable to see meta tags when I query data.

schema.xml configuration I've done,

To accept indexing meta tags I've define dynamic filed on solr, schema.xml
as follows,



Probably, mata tags are getting indexing on Solr but not available for
querying, Is there any way to check/debug  meta tags are actually indexed
on Solr or not ?
Please note that only search field is default text and I've tried with copy
field as bellow but no luck

 also
 since metatag.keywords
is one of the meta tag extracted from nutch.

Appreciate community help on this. Thanks

Shanaka


RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Ok. Maybe I found the problem:

in the solrconfig.xml I have true

I set it to false and now rmDocumentTitle is there too...

Regards
Francesco

-Original Message-
From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] 
Sent: Donnerstag, 13. März 2014 14:39
To: solr-user@lucene.apache.org
Subject: RE: Problem adding fields when indexing a pdf (add-on)

Yes, in my test class I always do server.deleteByQuery("*:*", 5); at first.

As you can see I have fullText and signatureField defined. And they are there.
The only difference is that they are not manually set.
Can it be, that if you use the literal.* parameter you have to use lowercase?

Regards
Francesco

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Donnerstag, 13. März 2014 14:35
To: solr-user@lucene.apache.org
Subject: Re: Problem adding fields when indexing a pdf (add-on)

On 13 March 2014 18:33, Croci  Francesco Luigi (ID SWS)  
wrote:
> Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the 
> field is there!
>
> Is there some naming rules for the field's names? No uppercase?

No. We have used mixed-case names in the past.

Are you sure that you reindexed the first time before checking?

Regards,
Gora


RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Yes, in my test class I always do server.deleteByQuery("*:*", 5); at first.

As you can see I have fullText and signatureField defined. And they are there.
The only difference is that they are not manually set.
Can it be, that if you use the literal.* parameter you have to use lowercase?

Regards
Francesco

-Original Message-
From: Gora Mohanty [mailto:g...@mimirtech.com] 
Sent: Donnerstag, 13. März 2014 14:35
To: solr-user@lucene.apache.org
Subject: Re: Problem adding fields when indexing a pdf (add-on)

On 13 March 2014 18:33, Croci  Francesco Luigi (ID SWS)  
wrote:
> Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the 
> field is there!
>
> Is there some naming rules for the field's names? No uppercase?

No. We have used mixed-case names in the past.

Are you sure that you reindexed the first time before checking?

Regards,
Gora


Re: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Gora Mohanty
On 13 March 2014 18:33, Croci  Francesco Luigi (ID SWS)
 wrote:
> Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the 
> field is there!
>
> Is there some naming rules for the field's names? No uppercase?

No. We have used mixed-case names in the past.

Are you sure that you reindexed the first time before checking?

Regards,
Gora


RE: Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
Ok, I renamed the filed " rmDocumentTitle" to " rmdocumenttitle" and now the 
field is there!

Is there some naming rules for the field's names? No uppercase?

Greetings
Francesco

-Original Message-
From: Croci Francesco Luigi (ID SWS) [mailto:fcr...@id.ethz.ch] 
Sent: Donnerstag, 13. März 2014 13:57
To: solr-user@lucene.apache.org
Subject: Problem adding fields when indexing a pdf (add-on)

I tried to define a new field "test" in the schema () and added 
req.setParam("literal.test", "test title"); in the code.

The field (test) is there O_O.

Can someone explain me the difference? Why rmDocumentTitle is not there while 
test is?

Ciao
Francesco




Problem adding fields when indexing a pdf (add-on)

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
I tried to define a new field "test" in the schema () and added 
req.setParam("literal.test", "test title"); in the code.

The field (test) is there O_O.

Can someone explain me the difference? Why rmDocumentTitle is not there while 
test is?

Ciao
Francesco




Problem adding fields when indexing a pdf

2014-03-13 Thread Croci Francesco Luigi (ID SWS)
When I index a pdf I would like to "manually" add the document's title in a 
filed named rmDocumentTitle.

I defined the filed in the schema.xml, but when I query Solr I see that the 
field was not created...

Do I make something wrong?

Below the code snippet, schema and solrconfig.xml

Thank you for any hint
Francesco

...
ContentStreamUpdateRequest req = new 
ContentStreamUpdateRequest("/update/extract");
req.addContentStream(contentStream);

req.setParam("literal.id", file.getName().substring(0, 
file.getName().indexOf('.')));
req.setParam("literal.rmDocumentTitle", "test title");
req.setParam("uprefix", "ignored_");

req.setAction(AbstractUpdateRequest.ACTION.COMMIT, true, true);

NamedList result = server.request(req);
...


schema.xml




   
   
   
   
   
   




   
   
   



   
   
   
   
   


fullText


id



solrconfig.xml



LUCENE_45











   
   deduplication
   



   
   true
   true
   false
   true
   true
   ignored_
   link
   fullText
   
   deduplication
   



   
   false
   signatureField
   true
   content
   10
   .2
   solr.update.processor.TextProfileSignature
   
   
   




none


   *:*





RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 uppwards

2014-03-13 Thread Andreas Owen
I have gotten nearly everything to work. There are to queries where i dont get 
back what i want.

"avaloq frage 1"-> only returns if i set minGramSize=1 while 
indexing
"yh_cug"-> query parser doesn't remove "_" but the 
indexer does (WDF) so there is no match

Is there a way to also query the hole term "avaloq frage 1" without tokenizing 
it?

Fieldtype:


   


 
 
 
  


   
   


 
 


  
 


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Mittwoch, 12. März 2014 18:39
To: solr-user@lucene.apache.org
Subject: RE: Re[2]: NOT SOLVED searches for single char tokens instead of from 
3 uppwards

Hi Jack,

do you know how i can use local parameters in my solrconfig? The params are 
visible in the debugquery-output but solr doesn't parse them.


{!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO 
*]) (+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *]) 


-Original Message-
From: Andreas Owen [mailto:a...@conx.ch]
Sent: Mittwoch, 12. März 2014 14:44
To: solr-user@lucene.apache.org
Subject: Re[2]: NOT SOLVED searches for single char tokens instead of from 3 
uppwards

yes that is exactly what happend in the analyzer. the term i searched for was 
listed on both sides (index & query).

here's the rest:










  

-Original-Nachricht- 
> Von: "Jack Krupansky" 
> An: solr-user@lucene.apache.org
> Datum: 12/03/2014 13:25
> Betreff: Re: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> You didn't show the new index analyzer - it's tricky to assure that 
> index and query are compatible, but the Admin UI Analysis page can help.
> 
> Generally, using pure defaults for WDF is not what you want, 
> especially for query time. Usually there needs to be a slight 
> asymmetry between index and query for WDF - index generates more terms than 
> query.
> 
> -- Jack Krupansky
> 
> -Original Message-
> From: Andreas Owen
> Sent: Wednesday, March 12, 2014 6:20 AM
> To: solr-user@lucene.apache.org
> Subject: RE: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> I now have the following:
> 
> 
> 
>  types="at-under-alpha.txt"/>  class="solr.LowerCaseFilterFactory"/>
>  words="lang/stopwords_de.txt" format="snowball" 
> enablePositionIncrements="true"/>   class="solr.GermanNormalizationFilterFactory"/>
> 
>   
> 
> The gui analysis shows me that wdf doesn't cut the underscore anymore 
> but it still returns 0 results?
> 
> Output:
> 
> 
>   yh_cug
>   yh_cug
>   (+DisjunctionMaxQuery((tags:yh_cug^10.0 |
> links:yh_cug^5.0 | thema:yh_cug^15.0 | plain_text:yh_cug^10.0 |
> url:yh_cug^5.0 | h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 |
> breadcrumb:yh_cug^6.0 | contentmanager:yh_cug^5.0 | title:yh_cug^20.0
> |
> editorschoice:yh_cug^200.0 | doctype:yh_cug^10.0))
> ((expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0) 
> FunctionQuery((div(int(clicks),max(int(displays),const(1^8.0))/no_
> coord
>   +(tags:yh_cug^10.0 |
> links:yh_cug^5.0 |
> thema:yh_cug^15.0 | plain_text:yh_cug^10.0 | url:yh_cug^5.0 |
> h_*:yh_cug^14.0 | inhaltstyp:yh_cug^6.0 | breadcrumb:yh_cug^6.0 |
> contentmanager:yh_cug^5.0 | title:yh_cug^20.0 |
> editorschoice:yh_cug^200.0 |
> doctype:yh_cug^10.0) ((expiration:[1394619501862 TO *]
> (+*:* -expiration:*))^6.0)
> (div(int(clicks),max(int(displays),const(1^8.0
>   
>   
> yh_cug
>   
>   
> DidntFindAnySynonyms
> No synonyms found for this query.  Check 
> your synonyms file.
>   
>   
> ExtendedDismaxQParser
> 
> 
>   (expiration:[NOW TO *] OR (*:* -expiration:*))^6
> 
> 
>   (expiration:[1394619501862 TO *]
> (+MatchAllDocsQuery(*:*) -expiration:*))^6.0
> 
> 
>   div(clicks,max(displays,1))^8
> 
>   
>   
> ExtendedDismaxQParser
> 
> 
>   div(clicks,max(displays,1))^8
> 
>   
>   
> 
> 
> 
> 
> -Original Message-
> From: Jack Krupansky [mailto:j...@basetechnology.com]
> Sent: Dienstag, 11. März 2014 14:25
> To: solr-user@lucene.apache.org
> Subject: Re: NOT SOLVED searches for single char tokens instead of 
> from 3 uppwards
> 
> The usual use of an ngram filter is at index time and not at query time.
> What exactly are you trying to achieve by using ngram filtering at 
> query time as well as index time?
> 
> Generally, it is inappropriate to combine the word delimiter filter 
> with the standard tokenizer - the later removes the punctuation that 
> normally influences how WDF treats the 

Re: Partial Counts in SOLR

2014-03-13 Thread Dmitry Kan
1. What is your solr version? In 4.x family the proximity searches have
been optimized among other query types.
2. Do you use the filter queries? What is the situation with the cache
utilization ratios? Optimize (= i.e. bump up the respective cache sizes) if
you have low hitratios and many evictions.
3. Can you avoid storing some fields and only index them? When the field is
stored and it is retrieved in the result, there are couple of disk seeks
per field=> search slows down. Consider SSD disks.
4. Do you monitor your system in terms of RAM / cache stats / GC? Do you
observe STW GC pauses?
5. How often do you commit & do you have the autowarming / external warming
configured?
6. If you use faceting, consider storing DocValues for facet fields.

some solr wiki docs:
https://wiki.apache.org/solr/SolrPerformanceProblems?highlight=%28%28SolrPerformanceFactors%29%29





On Thu, Mar 13, 2014 at 8:52 AM, Salman Akram <
salman.ak...@northbaysolutions.net> wrote:

> Well some of the searches take minutes.
>
> Below are some stats about this particular index that I am talking about:
>
> Index size = 400GB (Using CommonGrams so without that the index is around
> 180GB)
> Position File = 280GB
> Total Docs = 170 million (just indexed for searching - for highlighting
> contents are stored in another index)
> Avg Doc Size = Few hundred KBs
> RAM = 384GB (it has other indexes too but still OS cache can have 60-80% of
> the total index cached)
>
> Phrase queries run pretty fast with CG but complex versions of wildcard and
> proximity queries can be really slow. I know using CG will make them slow
> but they just take too long. By default sorting is on date but users have
> few other parameters too on which they can sort.
>
> I wanted to avoid creating multiple indexes (maybe based on years) but
> seems that to search on partial data that's the only feasible way.
>
>
>
>
> On Wed, Mar 12, 2014 at 2:47 PM, Dmitry Kan  wrote:
>
> > As Hoss pointed out above, different projects have different
> requirements.
> > Some want to sort by date of ingestion reverse, which means that having
> > posting lists organized in a reverse order with the early termination is
> > the way to go (no such feature in Solr directly). Some other projects
> want
> > to collect all docs matching a query, and then sort by rank, but you
> cannot
> > guarantee, that the most recently inserted document is the most relevant
> in
> > terms of your ranking.
> >
> >
> > Do your current searches take too long?
> >
> >
> > On Tue, Mar 11, 2014 at 11:51 AM, Salman Akram <
> > salman.ak...@northbaysolutions.net> wrote:
> >
> > > Its a long video and I will definitely go through it but it seems this
> is
> > > not possible with SOLR as it is?
> > >
> > > I just thought it would be quite a common issue; I mean generally for
> > > search engines its more important to show the first page results,
> rather
> > > than using timeAllowed which might not even return a single result.
> > >
> > > Thanks!
> > >
> > >
> > > --
> > > Regards,
> > >
> > > Salman Akram
> > >
> >
> >
> >
> > --
> > Dmitry
> > Blog: http://dmitrykan.blogspot.com
> > Twitter: http://twitter.com/dmitrykan
> >
>
>
>
> --
> Regards,
>
> Salman Akram
>



-- 
Dmitry
Blog: http://dmitrykan.blogspot.com
Twitter: http://twitter.com/dmitrykan


RE: use local param in solrconfig fq for access-control

2014-03-13 Thread Andreas Owen
I have given up this idee and made a wrapper which adds a fq with the userroles 
to each request

-Original Message-
From: Andreas Owen [mailto:a...@conx.ch] 
Sent: Dienstag, 11. März 2014 23:32
To: solr-user@lucene.apache.org
Subject: use local param in solrconfig fq for access-control

i would like to use $r and $org for access control. it has to allow the fq's 
from my facet to work aswell. i'm not sure if i'm doing it wright or if i 
should add it to a qf or the q itself. the debugquery returns a parsed fq 
string and in them $r and $org are printed instead of their values. how do i 
get them to be intepreted? the lacal params are listed in the response so they 
should be valid.


  {!q.op=OR} (*:* -organisations:["" TO *] -roles:["" TO *]) 
(+organisations:($org) +roles:($r)) (-organisations:["" TO *] +roles:($r)) 
(+organisations:($org) -roles:["" TO *])
 





Re: regex in Solr Query

2014-03-13 Thread Priti Solanki
Both works!!

pubdateraw:[2005 TO 2005]
pubdateraw:[20050101 TO 20051231]

Thanks Raymond for sharing the useful info as well.


On Thu, Mar 13, 2014 at 4:30 PM, Raymond Wiker  wrote:

> Regular expressions is a text-matching mechanism, so you shouldn't expect
> to be able to use it on numeric data. If your timestamps are of the form
> you indicate, you should be able to filter on pubdateraw:[2005 TO
> 2005].
>
>
> On Thu, Mar 13, 2014 at 11:45 AM, Priti Solanki  >wrote:
>
> > Hi,
> >
> >
> > I am trying to fetch all the record for 2005
> >
> > I have field(int) "pubdateraw": 20130508
> >
> > Not working - select?q=pubdateraw:/2013*/
> >
> > Not working - select?q=pubdateraw:/.2013*./
> >
> > Is it possible to have regex on int field in solr 4.5??
> >
> > to get the record with "20130508" how am i suppose to write my query in
> > solr.
> > Any reading links will be very much helpful.
> >
> > Regards,
> > Priti
> >
>


Re: regex in Solr Query

2014-03-13 Thread Raymond Wiker
Regular expressions is a text-matching mechanism, so you shouldn't expect
to be able to use it on numeric data. If your timestamps are of the form
you indicate, you should be able to filter on pubdateraw:[2005 TO
2005].


On Thu, Mar 13, 2014 at 11:45 AM, Priti Solanki wrote:

> Hi,
>
>
> I am trying to fetch all the record for 2005
>
> I have field(int) "pubdateraw": 20130508
>
> Not working - select?q=pubdateraw:/2013*/
>
> Not working - select?q=pubdateraw:/.2013*./
>
> Is it possible to have regex on int field in solr 4.5??
>
> to get the record with "20130508" how am i suppose to write my query in
> solr.
> Any reading links will be very much helpful.
>
> Regards,
> Priti
>


Re: regex in Solr Query

2014-03-13 Thread Ahmet Arslan
Hi Priti,

Thats an interesting question, I wonder the answer by myself too. Does prefix 
query work with int?
q=pubdateraw:2013*  ?

By mean time, as a workaround, try range queries. q=pubdateraw:{20130101 TO 
20131231}



On Thursday, March 13, 2014 12:45 PM, Priti Solanki  
wrote:
Hi,


I am trying to fetch all the record for 2005

I have field(int) "pubdateraw": 20130508

Not working - select?q=pubdateraw:/2013*/

Not working - select?q=pubdateraw:/.2013*./

Is it possible to have regex on int field in solr 4.5??

to get the record with "20130508" how am i suppose to write my query in
solr.
Any reading links will be very much helpful.

Regards,
Priti



regex in Solr Query

2014-03-13 Thread Priti Solanki
Hi,


I am trying to fetch all the record for 2005

I have field(int) "pubdateraw": 20130508

Not working - select?q=pubdateraw:/2013*/

Not working - select?q=pubdateraw:/.2013*./

Is it possible to have regex on int field in solr 4.5??

to get the record with "20130508" how am i suppose to write my query in
solr.
Any reading links will be very much helpful.

Regards,
Priti


Re: ClassCastException when streaming response

2014-03-13 Thread Marius Dumitru Florea
On Thu, Mar 13, 2014 at 10:41 AM, Marius Dumitru Florea
 wrote:
> Hi guys,
>
> The following code
>
> server.queryAndStreamResponse(new SolrQuery("*:*"), new
> StreamingResponseCallback() {
> public void streamSolrDocument(SolrDocument doc) {
> }
> public void streamDocListInfo(long numFound, long start, Float maxScore) {
> }
> });
>
> throws
>
> Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be
> cast to java.lang.String
> at 
> org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:124)
> at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
> at 
> org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
> at 
> org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:205)
> ... 10 more
>
> streamDocListInfo and streamSolrDocument are called as expected but
> the exception is thrown after all the results are streamed, when
> building the QueryResponse to return (which is not really needed).
>
> I debugged a bit and it fails while unmarshaling:
>
> {
>   responseHeader = {
> status = 0,
> QTime = 1716,
> params = {
>   q = *:*,
>   fl = name,space,wiki,doclocale,
>   fq = type:DOCUMENT,
>   rows = 100,
>   start = 0
> }
>   },
>   response = {
> numFound=571,
> start=0,
> maxScore = 1.0,
> docs=[]
>   },

>   --> here it throws an exception because it expects a String key, but
> it gets an array..

After more debugging I found that what the missing part here that
fails to be unmarshaled is

  facet_counts = {
facet_queries = {},
facet_fields = {wiki={},space_exact={},locale={}, ...},
facet_dates = {},
facet_ranges = {}
  },
  highlighting = {foo={},bar={}}

So this part gets broken in the marshaling / unmarshaling process when
streaming the query results. I disabled the faceting and highlighting
and no more exception.

Note that first I tried to disable the faceting and highlighting by calling:

query.setFacet(false);
query.setHighlight(false);

which didn't work because these calls only remove the faceting and
highlighting parameters from the query and I had the faceting and
highlighting enabled by default in the Solr configuration. I find this
methods confusing. In the end I used:

query.set("facet", false);
query.set("hl", false);

which did the trick.

Thankfully I don't need faceting nor highlighting in my case but this
ClassCastException thrown when streaming the query results surely
looks like a bug. Can you reproduce?

Thanks,
Marius

> }
>
> Note that on my project I have two branches
>
> * one using Solr 4.0.0 and Java 1.6
> * one using Solr 4.7.0 and Java 1.7
>
> Both produce the same exception. Do you know what could be wrong?
>
> Thanks,
> Marius


Re: Solr to return the list of matched fields

2014-03-13 Thread heaven
Hi, thank you, when it is good for visual review it is hard to work with this
data. What I need is to build something like this:

| Name  | Twitter Profile | Topics | Site Title | Site Description |
Site content |
| John Doe | Yes| No  | Yes | No   
| Yes |
| Jane Doe | No | Yes | No  | No   
| Yes | 

So a user could see what namely field (content) matched the query. Debug is
a little confusing, could have many levels of nesting and is unclear to me.



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Solr-to-return-the-list-of-matched-fields-tp4122613p4123347.html
Sent from the Solr - User mailing list archive at Nabble.com.


RE: Network path for data directory

2014-03-13 Thread Suresh Soundararajan
Prasi,

It is not possible to use the index files of one solr instance for the second 
instance. The reason behind this is while booting the solr instance it will get 
lock the schema and index files to make sure other instance won't update the 
index and schema files.

As you mentioned like want the second instance in separate machine, you can 
configure the second one as child and the first instance as parent and 
configure the replication time based on your update operation on the parent 
index.

Thanks,
SureshKumar.S


From: Prasi S 
Sent: Thursday, March 13, 2014 2:44 PM
To: solr-user@lucene.apache.org
Subject: Network path for data directory

Hi,
I have solr index directory in a machine. I want a second solr instance on
a different server to use this index. Is it possible to specify the path of
a remote machine for data directory.

Thanks,
Prasi
[Aspire Systems]

This e-mail message and any attachments are for the sole use of the intended 
recipient(s) and may contain proprietary, confidential, trade secret or 
privileged information. Any unauthorized review, use, disclosure or 
distribution is prohibited and may be a violation of law. If you are not the 
intended recipient, please contact the sender by reply e-mail and destroy all 
copies of the original message.


Network path for data directory

2014-03-13 Thread Prasi S
Hi,
I have solr index directory in a machine. I want a second solr instance on
a different server to use this index. Is it possible to specify the path of
a remote machine for data directory.

Thanks,
Prasi


RE: IDF maxDocs / numDocs

2014-03-13 Thread Markus Jelsma
Oh yes, i see what you mean. I would try SOLR-1632 and have distributed IDF, 
but it seems to be broken now.
 
-Original message-
> From:Steven Bower 
> Sent: Wednesday 12th March 2014 21:47
> To: solr-user 
> Subject: Re: IDF maxDocs / numDocs
> 
> My problem is that both maxDoc() and docCount() both report documents that
> have been deleted in their values. Because of merging/etc.. those numbers
> can be different per replica (or at least that is what I'm seeing). I need
> a value that is consistent across replicas... I see in the comment it makes
> mention of not using IndexReader.numDocs() but there doesn't seem to me a
> way to get ahold of the IndexReader within a similarity implementation (as
> only TermStats, CollectionStats are passed in, and neither contains of ref
> to the reader)
> 
> I am contemplating just using a static value for the "number of docs" as
> this won't change dramatically often..
> 
> steve
> 
> 
> On Wed, Mar 12, 2014 at 11:18 AM, Markus Jelsma
> wrote:
> 
> > Hi Steve - it seems most similarities use CollectionStatistics.maxDoc() in
> > idfExplain but there's also a docCount(). We use docCount in all our custom
> > similarities, also because it allows you to have multiple languages in one
> > index where one is much larger than the other. The small language will have
> > very high IDF scores using maxDoc but they are proportional enough using
> > docCount(). Using docCount() also fixes SolrCloud ranking problems, unless
> > one of your replica's becomes inconsistent ;)
> >
> >
> > https://lucene.apache.org/core/4_7_0/core/org/apache/lucene/search/CollectionStatistics.html#docCount%28%29
> >
> >
> >
> > -Original message-
> > > From:Steven Bower 
> > > Sent: Wednesday 12th March 2014 16:08
> > > To: solr-user 
> > > Subject: IDF maxDocs / numDocs
> > >
> > > I am noticing the maxDocs between replicas is consistently different and
> > > that in the idf calculation it is used which causes idf scores for the
> > same
> > > query/doc between replicas to be different. obviously an optimize can
> > > normalize the maxDocs scores, but that is only temporary.. is there a way
> > > to have idf use numDocs instead (as it should be consistent across
> > > replicas)?
> > >
> > > thanks,
> > >
> > > steve
> > >
> >
> 


ClassCastException when streaming response

2014-03-13 Thread Marius Dumitru Florea
Hi guys,

The following code

server.queryAndStreamResponse(new SolrQuery("*:*"), new
StreamingResponseCallback() {
public void streamSolrDocument(SolrDocument doc) {
}
public void streamDocListInfo(long numFound, long start, Float maxScore) {
}
});

throws

Caused by: java.lang.ClassCastException: java.util.ArrayList cannot be
cast to java.lang.String
at 
org.apache.solr.common.util.JavaBinCodec.readOrderedMap(JavaBinCodec.java:124)
at org.apache.solr.common.util.JavaBinCodec.readVal(JavaBinCodec.java:188)
at org.apache.solr.common.util.JavaBinCodec.unmarshal(JavaBinCodec.java:116)
at 
org.apache.solr.client.solrj.embedded.EmbeddedSolrServer.request(EmbeddedSolrServer.java:205)
... 10 more

streamDocListInfo and streamSolrDocument are called as expected but
the exception is thrown after all the results are streamed, when
building the QueryResponse to return (which is not really needed).

I debugged a bit and it fails while unmarshaling:

{
  responseHeader = {
status = 0,
QTime = 1716,
params = {
  q = *:*,
  fl = name,space,wiki,doclocale,
  fq = type:DOCUMENT,
  rows = 100,
  start = 0
}
  },
  response = {
numFound=571,
start=0,
maxScore = 1.0,
docs=[]
  },
  --> here it throws an exception because it expects a String key, but
it gets an array..
}

Note that on my project I have two branches

* one using Solr 4.0.0 and Java 1.6
* one using Solr 4.7.0 and Java 1.7

Both produce the same exception. Do you know what could be wrong?

Thanks,
Marius


Re: Re-index Parent-Child Schema

2014-03-13 Thread Mikhail Khludnev
Hello Vijay,
You can try FieldCollepsing, Join, Block-join, or just concatenate both
field and search for concatenation.


On Thu, Mar 13, 2014 at 7:16 AM, Vijay Kokatnur wrote:

> Hi,
>
> I've inherited an Solr application with a Schema that contains parent-child
> relationship.  All child elements are maintained in multi-value fields.
> So an Order with 3 Order lines will result in an array of size 3 in Solr,
>
> This worked fine as long as clients queried only on Order, but with new
> requirements it is serving inaccurate results.
>
> Consider some orders, for example -
>
>
>  {
> OrderId:123
> BookingRecordId : ["145", "987", "*234*"]
> OrderLineType : ["11", "12", "*13*"]
> .
> }
>  {
> OrderId:345
> BookingRecordId : ["945", "882", "*234*"]
> OrderLineType : ["1", "12", "*11*"]
> .
> }
>  {
> OrderId:678
> BookingRecordId : ["444"]
> OrderLineType : ["11"]
> .
> }
>
>
> If you look up for an Order with BookingRecordId: 234 And OrderLineType:11.
>  You will get two orders : 123 and 345, which is correct per Solr.   You
> have two arrays in both the orders that satisfy this condition.
>
> However, for OrderId:123, the value at 3rd index of OrderLineType array is
> 13 and not 11( this is for BookingRecordId:145) this should be excluded.
>
> Per this blog :
>
> http://blog.griddynamics.com/2011/06/solr-experience-search-parent-child.html
>
> I can't use span queries as I have tons of child elements to query and I
> want to keep any changes to client queries to minimum.
>
> So is creating multiple indexes is the only way? We have 3 Physical boxes
> with SolrCloud and at some point we would like to shard.
>
> Appreciate any inputs.
>
>
> Best,
>
> -Vijay
>



-- 
Sincerely yours
Mikhail Khludnev
Principal Engineer,
Grid Dynamics


 


Re: Solr Cloud Segments and Merging Issues

2014-03-13 Thread remi tassing
Hi Varun,

I would just like to say that I have the same two problems you've mentioned
and I couldn't figure out a way to solve them.

For the 2nd I've posted a question a couple of days ago, title: "Result
merging takes too long"

Remi


On Thu, Mar 13, 2014 at 3:44 PM, Varun Rajput  wrote:

> I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
> machine with a zookeeper quorum running on 3 other machines. The index size
> on each shard is about 15GB. I noticed that the number of segments in
> second shard was 42 and in the remaining shards was between 25-30.
>
> I am basically trying to get the number of segments down to a reasonable
> size like 4 or 5 in order to improve the search time. We do have some
> documents indexed everyday, so we don't want to do an optimize every day.
>
> The merge factor with the TierMergePolicy is only the number of segments
> per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
> shard, I tried clearing the index, reducing the mergeFactor and re-indexing
> the same data in the same manner, multiple times, but I don't see a pattern
> of reduction in number of segments.
>
> No mergeFactor set  => 42 segments
> mergeFactor=5  =>   22 segments
> mergeFactor=2  =>   22 segments
>
> Below is the simple configuration, as specified in the documentation, I am
> using for merging:
>
> 
>
>   2
>
>   2
>
> 
>
> 
>
> What is the best way in which I can use merging to restrict the number of
> segments being formed?
>
> Also, we are moving from Solr 1.4 (Master-Slave) to Solr 4.6.0 Cloud and
> see a great increase in response time from about 18ms to 150ms. Is this a
> known issue? Is there no way to reduce the response time? In the MBeans,
> the individual cores show the /select handler attributes having search
> times around 8ms. What is it that causes the overall response time to
> increase so much?
>
> -Varun
>


Re: Result merging takes too long

2014-03-13 Thread remi tassing
Hi Erick,

I've used the fl=id parameter to avoid retrieving the actual documents
(step <4> in your mail) but the problem still exists.
Any ideas on how to find the merging time(step <3>)?

Remi


On Tue, Mar 11, 2014 at 7:29 PM, Erick Erickson wrote:

> In SolrCloud there are a couple of round trips
> that _may_ be what you're seeing.
>
> First, though, the QTime is the time spent
> querying, it does NOT include assembling
> the documents from disk for return etc., so
> bear that in mind
>
> But here's the sequence as I understand it
> from the receiving node's viewpoint.
> 1> send the query out to one replica for
> each shard
> 2> get the top N doc IDs and scores (
> or whatever sorting criteria) from each
> shard.
> 3> Merge the lists and select the top N
> to return
> 4> request the actual documents for
> the top N list from each of the shards
> 5> return the list.
>
> So as you can see, there's an extra
> round trip to each shard to get the
> full document. Perhaps this is what
> you're seeing? <4> seems like it
> might be what you're seeing, I don't
> think it's counted in QTime.
>
> HTH
> Erick
>
> On Tue, Mar 11, 2014 at 3:17 AM, remi tassing 
> wrote:
> > Hi,
> >
> > I've just setup a SolrCloud with Tomcat. 5 Shards with one replication
> each
> > and total 10million docs (evenly distributed).
> >
> > I've noticed the query response time is faster than using one single node
> > but still not as fast as I expected.
> >
> > After turning debugQuery on, I noticed the query time is different to the
> > value returned in the debug explanation (see some excerpt below). More
> > importantly, while making a query to one, and only one, shard then the
> > result is consistent. It appears the server spends most of its time doing
> > result aggregation (merging).
> >
> > After searching on Google in vain I didn't find anything concrete except
> > that the problem could be in 'SearchComponent'.
> >
> > Could you point me in the right direction (e.g. configuration...)?
> >
> > Thanks!
> >
> > Remi
> >
> > Solr Cloud result:
> >
> > 
> >
> > 0
> >
> > 3471
> >
> > 
> >
> > on
> >
> > project development agile
> >
> > 
> >
> > 
> >
> >  > maxScore="0.17022902">...
> >
> > ...
> >
> >
> >
> > 
> >
> > 508.0
> >
> > 
> >
> > 8.0
> >
> > 
> >
> > 8.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 
> >
> > 499.0
> >
> > 
> >
> > 195.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 228.0
> >
> > 
> >
> > 
> >
> > 0.0
> >
> > 
> >
> > 
> >
> > 76.0
> >
> > 
> >
> > 
> >
> > 
>


Solr Cloud Segments and Merging Issues

2014-03-13 Thread Varun Rajput
I am using Solr 4.6.0 in cloud mode. The setup is of 4 shards, 1 on each
machine with a zookeeper quorum running on 3 other machines. The index size
on each shard is about 15GB. I noticed that the number of segments in
second shard was 42 and in the remaining shards was between 25-30.

I am basically trying to get the number of segments down to a reasonable
size like 4 or 5 in order to improve the search time. We do have some
documents indexed everyday, so we don't want to do an optimize every day.

The merge factor with the TierMergePolicy is only the number of segments
per tier. Assuming there were 5 tiers (mergeFactor of 10) in the second
shard, I tried clearing the index, reducing the mergeFactor and re-indexing
the same data in the same manner, multiple times, but I don't see a pattern
of reduction in number of segments.

No mergeFactor set  => 42 segments
mergeFactor=5  =>   22 segments
mergeFactor=2  =>   22 segments

Below is the simple configuration, as specified in the documentation, I am
using for merging:



  2

  2





What is the best way in which I can use merging to restrict the number of
segments being formed?

Also, we are moving from Solr 1.4 (Master-Slave) to Solr 4.6.0 Cloud and
see a great increase in response time from about 18ms to 150ms. Is this a
known issue? Is there no way to reduce the response time? In the MBeans,
the individual cores show the /select handler attributes having search
times around 8ms. What is it that causes the overall response time to
increase so much?

-Varun


Re: More Maintenance Releases?

2014-03-13 Thread Shawn Heisey
On 3/12/2014 6:27 PM, Erick Erickson wrote:
> Wondering if 4.7 is a "natural" point to do this.
> 
> See Uwe's announcement that as of Solr 4.8,
> Solr/Lucene will _require_ Java 1.7 rather than
> Java 1.6.
> 
> I know some organizations will not be able to
> make this transition easily, thus I suspect we'll
> see ongoing requests to "please back-port XXX
> to Solr 4.7 since we can't use Java 1.7). Hmmm,
> Solr 4.7, Java 1.7, coincidence? :).
> 
> Does it make any sense to think of essentially
> freezing 4.7 except for bug fixes that we selectively
> back-port?

It's possible, even likely, that we'll end up doing a number of 4.7.x
releases specifically because an important group of users can't upgrade
Java.  Those releases will probably happen on a longer timescale than
the minor releases, and I would not expect a large number of new
features to be backported.

The hassles involved in backporting across the Java 6 version boundary
might prove to be an accelerating factor for creating branch_5x and
getting that show on the road.

Addressing something earlier in the thread, here are my thoughts about
why we don't do very many bugfix releases:

* Based on what I've seen of the release process, there's not a huge
difference in effort for a minor release compared to a bugfix release.
Unless there are extremely severe bugs to address, release manager
volunteers would rather put that effort into new and improved features.

* The current rapid pace of development, especially in SolrCloud, causes
so much drift in the codebase that it can be VERY difficult to backport
a critical fix.  Sometimes it's difficult because the fix is embedded in
changes for a whole new feature.  Backporting new features to a bugfix
release is usually avoided, because it can introduce NEW bugs.

Thanks,
Shawn